Wenhai created ASTERIXDB-1776:
---------------------------------
Summary: Data loss in many multi-partitions
Key: ASTERIXDB-1776
URL: https://issues.apache.org/jira/browse/ASTERIXDB-1776
Project: Apache AsterixDB
Issue Type: Bug
Components: Hyracks Core
Environment: MAC/Linux
Reporter: Wenhai
Assignee: Ian Maxon
Priority: Critical
Attachments: cc.log, demo.xml, execute.log, tpch_node1.log,
tpch_node2.log
Total description: If we configure more than 24 partitions in each NC, we
always loss almost half of the partitions, without any error information or
logs.
Schema:
{noformat}
drop dataverse tpch if exists;
create dataverse tpch;
use dataverse tpch;
create type LineItemType as closed {
l_orderkey: int32,
l_partkey: int32,
l_suppkey: int32,
l_linenumber: int32,
l_quantity: int32,
l_extendedprice: double,
l_discount: double,
l_tax: double,
l_returnflag: string,
l_linestatus: string,
l_shipdate: string,
l_commitdate: string,
l_receiptdate: string,
l_shipinstruct: string,
l_shipmode: string,
l_comment: string
}
create dataset LineItem(LineItemType)
primary key l_orderkey, l_linenumber;
load dataset LineItem
using localfs
(("path"="127.0.0.1:///path-to-tpch-data/tpch0.001/lineitem.tbl"),("format"="delimited-text"),("delimiter"="|"));
{noformat}
Query:
{noformat}
use dataverse tpch;
let $s := count(
for $d in dataset LineItem
return $d
)
return $s
{noformat}
Return:
{noformat}
6005
{noformat}
Command:
{noformat}
managix stop -n tpch
managix start -n tpch
{noformat}
Query:
{noformat}
use dataverse tpch;
let $s := count(
for $d in dataset LineItem
return $d
)
return $s
{noformat}
Return:
{noformat}
4521
{noformat}
We lose 1/3 records in this tiny test. When we increase the tpch scale onto
200gb across 196 partitions by the distribution of 8 X 24, we should get 1.2
billion records, but it only returned 0.45 billion!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)