[
https://issues.apache.org/jira/browse/HBASE-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13596774#comment-13596774
]
Anoop Sam John commented on HBASE-8031:
---------------------------------------
Thanks Enis for bringing this up. We were also in need for a testing framework
like this. Looking forward for your patch ..
> Adopt goraci as an Integration test
> -----------------------------------
>
> Key: HBASE-8031
> URL: https://issues.apache.org/jira/browse/HBASE-8031
> Project: HBase
> Issue Type: Improvement
> Components: test
> Reporter: Enis Soztutar
> Assignee: Enis Soztutar
> Fix For: 0.95.0, 0.98.0, 0.94.7
>
>
> As you might know, I am a big fan of the goraci test that Keith Turner has
> developed, which in turn is inspired by the Accumulo test called Continuous
> Ingest.
> As much as I hate to say it, having to rely on gora and and external github
> library makes using this lib cumbersome. And lately we had to use this for
> testing against secure clusters and with Hadoop2, which gora does not support
> for now.
> So, I am proposing we add this test as an IT in the HBase code base so that
> all HBase devs can benefit from it.
> The original source code can be found here:
> * https://github.com/keith-turner/goraci
> * https://github.com/enis/goraci/
> From the javadoc:
> {code}
> Apache Accumulo [0] has a simple test suite that verifies that data is not
> * lost at scale. This test suite is called continuous ingest. This test runs
> * many ingest clients that continually create linked lists containing 25
> * million nodes. At some point the clients are stopped and a map reduce job
> is
> * run to ensure no linked list has a hole. A hole indicates data was lost.··
> *
> * The nodes in the linked list are random. This causes each linked list to
> * spread across the table. Therefore if one part of a table loses data, then
> it
> * will be detected by references in another part of the table.
> *
> Below is rough sketch of how data is written. For specific details look at
> * the Generator code.
> *
> * 1 Write out 1 million nodes· 2 Flush the client· 3 Write out 1 million that
> * reference previous million· 4 If this is the 25th set of 1 million nodes,
> * then update 1st set of million to point to last· 5 goto 1
> *
> * The key is that nodes only reference flushed nodes. Therefore a node should
> * never reference a missing node, even if the ingest client is killed at any
> * point in time.
> *
> * Some ASCII art time:
> * [ . . . ] represents one batch of random longs of length WIDTH
> *
> * _________________________
> * | ______ |
> * | | ||
> * __+_________________+_____ ||
> * v v v |||
> * first = [ . . . . . . . . . . . ] |||
> * ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ |||
> * | | | | | | | | | | | |||
> * prev = [ . . . . . . . . . . . ] |||
> * ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ |||
> * | | | | | | | | | | | |||
> * current = [ . . . . . . . . . . . ] |||
> * |||
> * ... |||
> * |||
> * last = [ . . . . . . . . . . . ] |||
> * | | | | | | | | | | |-----|||
> * | |--------||
> * |___________________________|
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira