[
https://issues.apache.org/jira/browse/DRILL-5472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rahul Challapalli updated DRILL-5472:
-------------------------------------
Attachment: drill5472.sys.drill
drill5472.parquet
drill5472.log
Below is the value vector density information from the logs
{code}
DEBUG o.a.d.e.p.i.x.m.ExternalSortBatch - Actual batch schema & sizes {
mapid(type: VARCHAR, std col. size: 54, actual col. size: 8, total size:
73728, data size: 56004, row capacity: 8191, density: 76)
col1(type: BIGINT, std col. size: 8, actual col. size: 9, total size: 73728,
data size: 72000, row capacity: 8192, density: 98)
col2(type: BIGINT, std col. size: 8, actual col. size: 29, total size:
294912, data size: 224004, row capacity: 8191, density: 76)
a(type: VARCHAR, std col. size: 54, actual col. size: 7, total size: 196608,
data size: 126004, row capacity: 32767, density: 65)
b(type: VARCHAR, std col. size: 54, actual col. size: 6, total size: 196608,
data size: 112004, row capacity: 32767, density: 57)
c(type: BIGINT, std col. size: 8, actual col. size: 9, total size: 294912,
data size: 180000, row capacity: 32768, density: 62)
d(type: FLOAT8, std col. size: 8, actual col. size: 9, total size: 294912,
data size: 180000, row capacity: 32768, density: 62)
T10¦¦missing1(type: INT, std col. size: 4, actual col. size: 5, total size:
327680, data size: 20000, row capacity: 65536, density: 7)
T10¦¦missing2(type: VARCHAR, std col. size: 54, actual col. size: 11, total
size: 4521984, data size: 40004, row capacity: 65535, density: 1)
T10¦¦missing3(type: BIT, std col. size: 1, actual col. size: 2, total size:
73728, data size: 8000, row capacity: 65536, density: 11)
T10¦¦missing4(type: FLOAT8, std col. size: 8, actual col. size: 9, total
size: 589824, data size: 36000, row capacity: 65536, density: 7)
T10¦¦missing5(type: VARCHAR, std col. size: 54, actual col. size: 10, total
size: 4521984, data size: 36004, row capacity: 65535, density: 1)
T10¦¦missing6(type: DATE, std col. size: 8, actual col. size: 9, total size:
589824, data size: 36000, row capacity: 65536, density: 7)
T10¦¦missing7(type: FLOAT8, std col. size: 8, actual col. size: 9, total
size: 589824, data size: 36000, row capacity: 65536, density: 7)
T10¦¦missing8(type: VARCHAR, std col. size: 54, actual col. size: 9, total
size: 4521984, data size: 32004, row capacity: 65535, density: 1)
T10¦¦missing9(type: INTERVAL, std col. size: 16, actual col. size: 17, total
size: 1114112, data size: 68000, row capacity: 65536, density: 7)
T10¦¦missing10(type: INTERVAL, std col. size: 16, actual col. size: 17, total
size: 1114112, data size: 68000, row capacity: 65536, density: 7)
T10¦¦missing11(type: INTERVAL, std col. size: 16, actual col. size: 17, total
size: 1114112, data size: 68000, row capacity: 65536, density: 7)
x(type: BIGINT, std col. size: 8, actual col. size: 9, total size: 36864,
data size: 36000, row capacity: 4096, density: 98)
y(type: FLOAT8, std col. size: 8, actual col. size: 9, total size: 36864,
data size: 36000, row capacity: 4096, density: 98)
T10¦¦missing13(type: VARCHAR, std col. size: 54, actual col. size: 22, total
size: 4521984, data size: 84004, row capacity: 65535, density: 2)
T10¦¦missing14(type: VARCHAR, std col. size: 54, actual col. size: 22, total
size: 4521984, data size: 84004, row capacity: 65535, density: 2)
T10¦¦missing15(type: VARCHAR, std col. size: 54, actual col. size: 38, total
size: 4521984, data size: 148004, row capacity: 65535, density: 4)
T10¦¦missing16(type: VARCHAR, std col. size: 54, actual col. size: 38, total
size: 4521984, data size: 148004, row capacity: 65535, density: 4)
T10¦¦missing17(type: VARCHAR, std col. size: 54, actual col. size: 10, total
size: 4521984, data size: 36004, row capacity: 65535, density: 1)
T10¦¦missing18(type: VARCHAR, std col. size: 54, actual col. size: 11, total
size: 4521984, data size: 40004, row capacity: 65535, density: 1)
T10¦¦missing19(type: VARCHAR, std col. size: 54, actual col. size: 39, total
size: 4521984, data size: 152004, row capacity: 65535, density: 4)
T10¦¦missing20(type: INT, std col. size: 4, actual col. size: 5, total size:
327680, data size: 20000, row capacity: 65536, density: 7)
T10¦¦missing21(type: INT, std col. size: 4, actual col. size: 5, total size:
327680, data size: 20000, row capacity: 65536, density: 7)
T10¦¦missing22(type: INT, std col. size: 4, actual col. size: 5, total size:
327680, data size: 20000, row capacity: 65536, density: 7)
T10¦¦missing23(type: VARBINARY, std col. size: 54, actual col. size: 10,
total size: 4521984, data size: 36004, row capacity: 65535, density: 1)
T10¦¦missing24(type: VARCHAR, std col. size: 54, actual col. size: 15, total
size: 4521984, data size: 56004, row capacity: 65535, density: 2)
T10¦¦missing25(type: VARCHAR, std col. size: 54, actual col. size: 17, total
size: 4521984, data size: 64004, row capacity: 65535, density: 2)
T10¦¦missing26(type: TIME, std col. size: 4, actual col. size: 5, total size:
327680, data size: 20000, row capacity: 65536, density: 7)
T10¦¦missing27(type: TIMESTAMP, std col. size: 8, actual col. size: 9, total
size: 589824, data size: 36000, row capacity: 65536, density: 7)
T10¦¦missing28(type: DATE, std col. size: 8, actual col. size: 9, total size:
589824, data size: 36000, row capacity: 65536, density: 7)
T10¦¦missing 29(type: BIGINT, std col. size: 8, actual col. size: 9, total
size: 589824, data size: 36000, row capacity: 65536, density: 7)
T10¦¦missing30(type: DATE, std col. size: 8, actual col. size: 9, total size:
589824, data size: 36000, row capacity: 65536, density: 7)
T10¦¦missing31(type: FLOAT8, std col. size: 8, actual col. size: 9, total
size: 589824, data size: 36000, row capacity: 65536, density: 7)
T10¦¦missing32(type: TIME, std col. size: 4, actual col. size: 5, total size:
327680, data size: 20000, row capacity: 65536, density: 7)
T10¦¦missing33(type: TIMESTAMP, std col. size: 8, actual col. size: 9, total
size: 589824, data size: 36000, row capacity: 65536, density: 7)
T10¦¦missing34(type: VARCHAR, std col. size: 54, actual col. size: 49, total
size: 4521984, data size: 192004, row capacity: 65535, density: 5)
T10¦¦m1(type: INT, std col. size: 4, actual col. size: 5, total size: 327680,
data size: 20000, row capacity: 65536, density: 7)
EXPR$1(type: BIGINT, std col. size: 8, actual col. size: 9, total size:
36864, data size: 36000, row capacity: 4096, density: 98)
Records: 4000, Total size: 75870208, Gross row width:18969, Net row
width:574, Density:22}
{code}
> Parquet reader generating low-density batches causing Sort operator to spill
> un-necessarily
> -------------------------------------------------------------------------------------------
>
> Key: DRILL-5472
> URL: https://issues.apache.org/jira/browse/DRILL-5472
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Relational Operators, Storage - Parquet
> Reporter: Rahul Challapalli
> Assignee: Paul Rogers
> Attachments: drill5472.log, drill5472.parquet, drill5472.sys.drill
>
>
> git.commit.id.abbrev=1e0a14c
> The parquet file used in the below query is ~20MB. The uncompressed size id
> ~1.2 GB. Now the below query has a sort which is given ~6GB memory for a
> single fragment and yet it spills.
> {code}
> select * from (select * from
> dfs.`/drill/testdata/resource-manager/all_types_large` s order by
> s.missing12.x) d where d.missing3 is false;
> {code}
> The profile indicates that the above query has spilled twice. Attached the
> profile and the logs
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)