Ildar Absalyamov created ASTERIXDB-2141:
-------------------------------------------
Summary: Pre-sorted bulkload failure
Key: ASTERIXDB-2141
URL: https://issues.apache.org/jira/browse/ASTERIXDB-2141
Project: Apache AsterixDB
Issue Type: Bug
Reporter: Ildar Absalyamov
Assignee: Ian Maxon
Bulkloading pre-sorted input fails due to concurrency issue in
hash_partition_merge connector. The following DDL generates "HYR0046: Unsorted
load input" error.
The error is non-deterministic, but the chance of hitting it increases with the
length of the input.
{code:java}
drop dataverse experiments if exists;
create dataverse experiments;
use dataverse experiments;
set hash_merge "true"
create type TweetMessageType as open {
tweetid: int64
}
create dataset Tweets(TweetMessageType) primary key tweetid;
load dataset Tweets using localfs
(("path"="asterix_nc1://tweets.adm,asterix_nc2://tweets2.adm"),("format"="adm"))
pre-sorted;
{code}
despite the fact that input splits are individually sorted (tweets.adm and
tweets2.adm):
{code:title=tweets.adm}
{"tweetid":int64("2")}
{"tweetid":int64("4")}
{"tweetid":int64("6")}
{"tweetid":int64("8")}
{"tweetid":int64("10")}
{"tweetid":int64("12")}
{"tweetid":int64("14")}
{"tweetid":int64("16")}
{"tweetid":int64("18")}
{"tweetid":int64("20")}
{code}
{code:title=tweets2.adm}
{"tweetid":int64("1")}
{"tweetid":int64("3")}
{"tweetid":int64("5")}
{"tweetid":int64("7")}
{"tweetid":int64("9")}
{"tweetid":int64("11")}
{"tweetid":int64("13")}
{"tweetid":int64("15")}
{"tweetid":int64("17")}
{"tweetid":int64("19")}
{code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)