[jira] [Created] (ASTERIXDB-1776) Data loss in many multi-partitions

Wenhai (JIRA) Sat, 28 Jan 2017 01:11:46 -0800

Wenhai created ASTERIXDB-1776:
---------------------------------

             Summary: Data loss in many multi-partitions
                 Key: ASTERIXDB-1776
                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1776
             Project: Apache AsterixDB
          Issue Type: Bug
          Components: Hyracks Core
         Environment: MAC/Linux
            Reporter: Wenhai
            Assignee: Ian Maxon
            Priority: Critical
         Attachments: cc.log, demo.xml, execute.log, tpch_node1.log, 
tpch_node2.log


Total description: If we configure more than 24 partitions in each NC, we 
always loss almost half of the partitions, without any error information or 
logs.
Schema:
{noformat}
drop dataverse tpch if exists;
create dataverse tpch;
use dataverse tpch;

create type LineItemType as closed {
  l_orderkey: int32,
  l_partkey: int32,
  l_suppkey: int32,
  l_linenumber: int32,
  l_quantity: int32,
  l_extendedprice: double,
  l_discount: double,
  l_tax: double,
  l_returnflag: string,
  l_linestatus: string,
  l_shipdate: string,
  l_commitdate: string,
  l_receiptdate: string,
  l_shipinstruct: string,
  l_shipmode: string,
  l_comment: string
}

create dataset LineItem(LineItemType)
  primary key l_orderkey, l_linenumber;
load dataset LineItem 
using localfs
(("path"="127.0.0.1:///path-to-tpch-data/tpch0.001/lineitem.tbl"),("format"="delimited-text"),("delimiter"="|"));
{noformat}
Query:
{noformat}
use dataverse tpch;
let $s := count(
for $d in dataset LineItem
return $d
)
return $s
{noformat}
Return:
{noformat}
6005
{noformat}
Command:
{noformat}
managix stop -n tpch
managix start -n tpch
{noformat}
Query:
{noformat}
use dataverse tpch;
let $s := count(
for $d in dataset LineItem
return $d
)
return $s
{noformat}
Return:
{noformat}
4521
{noformat}
We lose 1/3 records in this tiny test. When we increase the tpch scale onto 
200gb across 196 partitions by the distribution of 8 X 24, we should get 1.2 
billion records, but it only returned 0.45 billion!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (ASTERIXDB-1776) Data loss in many multi-partitions

Reply via email to