[ 
https://issues.apache.org/jira/browse/HBASE-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13600506#comment-13600506
 ] 

Keith Turner commented on HBASE-8031:
-------------------------------------

Cool to see HBASE adopt this test.  I am not positive, but it seems like this 
patch contains the change that we had a discussion [1] about on github.  I am 
still conceptually opposed to this change.  I think in some situations a mapper 
rewritting the same data, because the task failed previously, could cover up 
the fact that data was lost in Hbase/Accumulo.  Since I created the test to 
detect data loss, the change bothers me a bit.  Granted the situation seems 
unlikely, but when running the test on a large clusters the unlikely sometimes 
becomes likely.

[1] : 
https://github.com/enis/goraci/commit/c320c50f5a5c562a13fa7a77b8da46c4e65e4f41#commitcomment-1489337


                
> Adopt goraci as an Integration test
> -----------------------------------
>
>                 Key: HBASE-8031
>                 URL: https://issues.apache.org/jira/browse/HBASE-8031
>             Project: HBase
>          Issue Type: Improvement
>          Components: test
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 0.95.0, 0.98.0, 0.94.7
>
>         Attachments: hbase-8031_v1.patch
>
>
> As you might know, I am a big fan of the goraci test that Keith Turner has 
> developed, which in turn is inspired by the Accumulo test called Continuous 
> Ingest. 
> As much as I hate to say it, having to rely on gora and and external github 
> library makes using this lib cumbersome. And lately we had to use this for 
> testing against secure clusters and with Hadoop2, which gora does not support 
> for now. 
> So, I am proposing we add this test as an IT in the HBase code base so that 
> all HBase devs can benefit from it.
> The original source code can be found here:
>  * https://github.com/keith-turner/goraci
>  * https://github.com/enis/goraci/
> From the javadoc:
> {code}
> Apache Accumulo [0] has a simple test suite that verifies that data is not
>  * lost at scale. This test suite is called continuous ingest. This test runs
>  * many ingest clients that continually create linked lists containing 25
>  * million nodes. At some point the clients are stopped and a map reduce job 
> is
>  * run to ensure no linked list has a hole. A hole indicates data was lost.··
>  *
>  * The nodes in the linked list are random. This causes each linked list to
>  * spread across the table. Therefore if one part of a table loses data, then 
> it
>  * will be detected by references in another part of the table.
>  *
> Below is rough sketch of how data is written. For specific details look at
>  * the Generator code.
>  *
>  * 1 Write out 1 million nodes· 2 Flush the client· 3 Write out 1 million that
>  * reference previous million· 4 If this is the 25th set of 1 million nodes,
>  * then update 1st set of million to point to last· 5 goto 1
>  *
>  * The key is that nodes only reference flushed nodes. Therefore a node should
>  * never reference a missing node, even if the ingest client is killed at any
>  * point in time.
>  *
>  * Some ASCII art time:
>      * [ . . . ] represents one batch of random longs of length WIDTH
>      *
>      *                _________________________
>      *               |                  ______ |
>      *               |                 |      ||
>      *             __+_________________+_____ ||
>      *             v v                 v     |||
>      * first   = [ . . . . . . . . . . . ]   |||
>      *             ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^     |||
>      *             | | | | | | | | | | |     |||
>      * prev    = [ . . . . . . . . . . . ]   |||
>      *             ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^     |||
>      *             | | | | | | | | | | |     |||
>      * current = [ . . . . . . . . . . . ]   |||
>      *                                       |||
>      * ...                                   |||
>      *                                       |||
>      * last    = [ . . . . . . . . . . . ]   |||
>      *             | | | | | | | | | | |-----|||
>      *             |                 |--------||
>      *             |___________________________|
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to