[jira] [Commented] (ASTERIXDB-1917) FLUSH_LSN for disk components is not correctly set when a NC holds multiple partitions

ASF subversion and git services (JIRA) Tue, 23 May 2017 21:26:53 -0700

    [ 
https://issues.apache.org/jira/browse/ASTERIXDB-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16022298#comment-16022298
 ]


ASF subversion and git services commented on ASTERIXDB-1917:
------------------------------------------------------------

Commit 1e51daacb8aacc42184e76fda1f2cd6b0eb2e824 in asterixdb's branch 
refs/heads/master from [~luochen01]
[ https://git-wip-us.apache.org/repos/asf?p=asterixdb.git;h=1e51daa ]

ASTERIXDB-1917: FLUSH_LSN for disk components is not correctly set

-Fixed a bug that FLUSH_LSN for flushed disk components is not
correctly set (not increasing) when an NC has multiple partitions.
-Added LSMIOOperationCallback unit tests to cover this bug

Change-Id: If438e34f8f612458d81f618eea04c0c72c49a9fe
Reviewed-on: https://asterix-gerrit.ics.uci.edu/1771
Reviewed-by: abdullah alamoudi <[email protected]>
Sonar-Qube: Jenkins <[email protected]>
Tested-by: Jenkins <[email protected]>
BAD: Jenkins <[email protected]>


> FLUSH_LSN for disk components is not correctly set when a NC holds multiple 
> partitions
> --------------------------------------------------------------------------------------
>
>                 Key: ASTERIXDB-1917
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1917
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: Hyracks, Storage
>            Reporter: Chen Luo
>            Assignee: Chen Luo
>         Attachments: asterix-build-configuration-lsm.xml, sample.zip
>
>
> When we flush a memory component of an index, we would set an LSN to the 
> result disk component. The LSN is set as the last operation which modifies 
> that memory component. Thus, given an index, the FLUSH_LSNs of its flushed 
> disk components should be increasing, i.e., later flushed components get 
> larger LSNs.
> However, currently I observed a bug that later flushed disk components get a 
> smaller LSN, which breaks this properly.
> A brief explanation of this bug is as follows. Suppose we have one dataset D, 
> and two partitions on one NC. Suppose D only has a primary index P. Further 
> suppose P1 and P2 are two partitioned indexes for the two partitions on this 
> NC. This implies P1 and P2 share the same PrimaryIndexOperationTracker, and 
> they would always be flushed together.
> The LSN for flushed disk components is ILSMIOOperationCallback. Now suppose 
> an index has two memory components. ILSMIOOperationCallback maintains an 
> array mutableLastLSNs of length 2 to track the FLUSH_LSN for two memory 
> components. Before scheduling each flush operation, 
> PrimaryIndexOperationTracker needs to call ILSMIOOperationCallback to set the 
> FLUSH_LSN.
> Now consider the following scenario (which happens but very rarely). 
> Initially,
> P1.mutableLastLSNs=[0,0]
> P2.mutableLastLSNs=[0,0]
> Suppose dataset D needs to be flushed, and mutableLastLSNs is set as follows:
> P1.mutableLastLSNs=[1, 0]
> P2.mutableLastLSNs=[1, 0]
> Then, suppose the flush operation of P2 is fast, and produces a disk 
> component P2.d1 (P2.d1.LSN = 1). Data continues to come into P2, and it needs 
> to be flushed again. However, P1 is still flushing the first memory 
> component. Then mutableLastLSNs become:
> P1.mutableLastLSNs=[1, 2]
> P2.mutableLastLSNs=[1, 2].
> Surprisingly, the flush operation of P2 is again fast, and produces a disk 
> component P2.d2 (P2.d2.LSN = 2). Still, data continues to come into P2, and 
> it needs to be flushed again. But P1 is still flushing the first memory 
> component. Then mutableLastLSNs become:
> P1.mutableLastLSNs=[3, 2]
> P2.mutableLastLSNs=[3, 2].
> At this time, P1 finishes its first flush operation, and produced the disk 
> component P1.d1 (P1.d1.LSN = 3). This in incorrect, since P1.d1.LSN should be 
> 1, not 3. Its original value is overwritten by the flush request of P2!
> To reproduce this bug, one needs to change the codebase slightly (i.e., 
> LSMBTreeIOOperationCallback). I added a member variable
> {code}
>  private volatile long prevLSN = 0;
> {code}
> and added one check in the getComponentLSN method:
> {code}
>  @Override
>     public long getComponentLSN(List<? extends ILSMComponent> diskComponents) 
> throws HyracksDataException {
>         if (diskComponents == null) {
>             // Implies a flush IO operation. --> moves the flush pointer
>             // Flush operation of an LSM index are executed sequentially.
>             synchronized (this) {
>                 long lsn = mutableLastLSNs[readIndex];
>                 if (!(prevLSN <= lsn)) {
>                     throw new IllegalStateException();
>                 }
>                 prevLSN = lsn;
>                 return lsn;
>             }
>         }
>         // Get max LSN from the diskComponents. Implies a merge IO operation 
> or Recovery operation.
>         long maxLSN = -1L;
>         for (ILSMComponent c : diskComponents) {
>             BTree btree = ((LSMBTreeDiskComponent) c).getBTree();
>             maxLSN = 
> Math.max(AbstractLSMIOOperationCallback.getTreeIndexLSN(btree), maxLSN);
>         }
>         return maxLSN;
>     }
> {code}
> Then, after starting AsterixDB using AsterixHyracksIntegrationUtil, you can 
> ingest the data and reproduce the bug using the following queries (you need 
> to replace the path_to_sample_data with the attached file):
> {code}
> drop dataverse twitter if exists;
> create dataverse twitter if not exists;
> use dataverse twitter
> create type typeUser if not exists as open {
>     id: int64,
>     name: string,
>     screen_name : string,
>     lang : string,
>     location: string,
>     create_at: date,
>     description: string,
>     followers_count: int32,
>     friends_count: int32,
>     statues_count: int64
> }
> create type typePlace if not exists as open{
>     country : string,
>     country_code : string,
>     full_name : string,
>     id : string,
>     name : string,
>     place_type : string,
>     bounding_box : rectangle
> }
> create type typeGeoTag if not exists as open {
>     stateID: int32,
>     stateName: string,
>     countyID: int32,
>     countyName: string,
>     cityID: int32?,
>     cityName: string?
> }
> create type typeTweet if not exists as open{
>     create_at : datetime,
>     id: int64,
>     "text": string,
>     in_reply_to_status : int64,
>     in_reply_to_user : int64,
>     favorite_count : int64,
>     coordinate: point?,
>     retweet_count : int64,
>     lang : string,
>     is_retweet: boolean,
>     hashtags : {{ string }} ?,
>     user_mentions : {{ int64 }} ? ,
>     user : typeUser,
>     place : typePlace?,
>     geo_tag: typeGeoTag
> }
> create dataset ds_tweet(typeTweet) if not exists primary key id 
> using compaction policy correlated-prefix 
> (("max-mergable-component-size"="134217728"),("max-tolerance-component-count"="5"))
>  with filter on create_at ;
> // with filter on create_at;
> //"using" "compaction" "policy" CompactionPolicy ( Configuration )? )?
> create feed TweetFeed using localfs
> (
>     ("path"="localhost:///path_to_sample_data"),
>     ("address-type"="nc"),
>     ("type-name"="typeTweet"),
>     ("format"="adm")
> );
> connect feed TweetFeed to dataset ds_tweet;
> start feed TweetFeed;
> {code}
> I attached the asterix-build-configuration-lsm.xml file and the sample data 
> file as below.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (ASTERIXDB-1917) FLUSH_LSN for disk components is not correctly set when a NC holds multiple partitions

Reply via email to