[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15489870#comment-15489870
 ] 

Wail Alkowaileet commented on ASTERIXDB-1414:
---------------------------------------------

Actually that feature is one of the best (among other things :-)).
I convinced a lot of people to use AsterixDB for Twitter just because this 
feature as they suffered from duplicates when they use the API directly. This 
can be way more expensive to clean and for storage size.

In a case my colleagues had, they collected tweets from the regular API and 
found that each tweet repeated 30 time! Specially for highly selective 
keywords. So the number of tweets are reduced by a factor of 30.

> Insert tweets with existed keys
> -------------------------------
>
>                 Key: ASTERIXDB-1414
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1414
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: Feeds
>            Reporter: Xikui Wang
>            Assignee: Xikui Wang
>            Priority: Minor
>
> When using the tweet id as key, some times there will have duplicate key 
> exception that pop up in the terminal. Here is the error message.
> {quote}
> org.apache.asterix.common.exceptions.FrameDataException: 
> org.apache.hyracks.storage.am.common.exceptions.TreeIndexDuplicateKeyException:
>  Failed to insert key since key already exists.
>       at 
> org.apache.asterix.common.dataflow.AsterixLSMInsertDeleteOperatorNodePushable.nextFrame(AsterixLSMInsertDeleteOperatorNodePushable.java:120)
>       at 
> org.apache.asterix.external.feed.dataflow.FeedRuntimeInputHandler.process(FeedRuntimeInputHandler.java:265)
>       at 
> org.apache.asterix.external.feed.dataflow.FeedRuntimeInputHandler.nextFrame(FeedRuntimeInputHandler.java:127)
>       at 
> org.apache.asterix.external.operators.FeedMetaStoreNodePushable.nextFrame(FeedMetaStoreNodePushable.java:176)
>       at org.apache.hyracks.control.nc.Task.pushFrames(Task.java:349)
>       at org.apache.hyracks.control.nc.Task.run(Task.java:297)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: 
> org.apache.hyracks.storage.am.common.exceptions.TreeIndexDuplicateKeyException:
>  Failed to insert key since key already exists.
>       at 
> org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTree.insert(LSMBTree.java:360)
>       at 
> org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTree.modify(LSMBTree.java:333)
>       at 
> org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.modify(LSMHarness.java:353)
>       at 
> org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.forceModify(LSMHarness.java:333)
>       at 
> org.apache.hyracks.storage.am.lsm.common.impls.LSMTreeIndexAccessor.forceInsert(LSMTreeIndexAccessor.java:158)
>       at 
> org.apache.asterix.common.dataflow.AsterixLSMInsertDeleteOperatorNodePushable.nextFrame(AsterixLSMInsertDeleteOperatorNodePushable.java:103)
>       ... 8 more
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to