[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15512156#comment-15512156
 ] 

Ian Maxon commented on ASTERIXDB-1636:
--------------------------------------

I've found and possibly fixed one issue so far, that was directly causing this, 
which was related to the difference in the computation of the filter index 
position between AqlMetadataProvider and 
SecondaryInvertedIndexOperationsHelper. The latter says the position is the num 
PKs + num SK, which seems to be correct, the former will give an index out of 
bounds. The former is used after restart, the latter gets used when the index 
is created. This would explain why everything seems to work just fine until 
restart. 

There also seems to be an issue or two with how filters are stored for inverted 
indices in general however. One issue is that a "identity" on-disk inverted 
index (so 0 tuples) may have a filter page, and this will cause merges 
involving it to fail. The other issue seems to be that there appears to be a 
way in which tuples to the inverted index might bypass updating the filters 
entirely but I'm less sure of this issue, I still need to dig into it more. 

> Feed cannot re-ingest after cluster restart
> -------------------------------------------
>
>                 Key: ASTERIXDB-1636
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1636
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: Feeds, Storage
>         Environment: master
> commit c89d668f68e5430a6ba4455daf8f9cd6f7040dd8
> Date:   Tue Sep 6 18:29:23 2016 -0700
>            Reporter: Jianfeng Jia
>            Assignee: Ian Maxon
>            Priority: Blocker
>              Labels: soon
>
> Here are steps to reproduce the problem:
> 1. start the cluster
> 2. ingest the initial data using file feed 
> [script|https://gist.github.com/JavierJia/9ed7744c938c5cb66aba63007b86a987]
> 2.1: file for ingestion: 
> https://drive.google.com/open?id=0B423M7wGZj9dNE5HenFqcjhuUFk
> 3. start another socket feed 
> [script|https://gist.github.com/JavierJia/565cefd9322df35c7abeefbfcfcee9f8] 
> to ingest the live data  
> 4. restart the cluster
> 5. start that live socket feed again.
> 6. with your own twitter credential you can use [this 
> script|https://github.com/ISG-ICS/cloudberry/blob/master/streamFeed.sh]  to 
> ingest the tweet
> 7. It will send at most 280 tweets and stops forever.
> [~imaxon] [~idleft] if you can help that will be great.
> related to ASTERIXDB-1264



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to