[ 
https://issues.apache.org/jira/browse/DRILL-6594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16540923#comment-16540923
 ] 

ASF GitHub Bot commented on DRILL-6594:
---------------------------------------

bitblender commented on issue #1375: DRILL-6594: Data batches for Project 
operator are not being split properly and exceed the maximum specified
URL: https://github.com/apache/drill/pull/1375#issuecomment-404350897
 
 
   @Ben-Zvi @ppadma Can one of you please take a look at this. This is an 
important fix that should be in 1.14

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Data batches for Project operator are not being split properly and exceed the 
> maximum specified
> -----------------------------------------------------------------------------------------------
>
>                 Key: DRILL-6594
>                 URL: https://issues.apache.org/jira/browse/DRILL-6594
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.14.0
>            Reporter: Robert Hou
>            Assignee: Karthikeyan Manivannan
>            Priority: Major
>             Fix For: 1.14.0
>
>
> I ran this query:
> alter session set `drill.exec.memory.operator.project.output_batch_size` = 
> 131072;
> alter session set `planner.width.max_per_node` = 1;
> alter session set `planner.width.max_per_query` = 1;
> select * from (
> select
> case when false then c.CharacterValuea else i.IntegerValuea end IntegerValuea,
> case when false then c.CharacterValueb else i.IntegerValueb end IntegerValueb,
> case when false then c.CharacterValuec else i.IntegerValuec end IntegerValuec,
> case when false then c.CharacterValued else i.IntegerValued end IntegerValued,
> case when false then c.CharacterValuee else i.IntegerValuee end IntegerValuee
> from (select * from dfs.`/drill/testdata/batch_memory/character5_1MB.parquet` 
> order by CharacterValuea) c,
> dfs.`/drill/testdata/batch_memory/integer5_1MB.parquet` i
> where i.Index = c.Index and
> c.CharacterValuea = '1234567890123100') limit 10;
> An incoming batch looks like this:
> 2018-06-14 19:28:10,905 [24dcdbc7-2f42-16a9-56f1-9cf58bc549bc:frag:5:0] DEBUG 
> o.a.d.e.p.i.p.ProjectMemoryManager - BATCH_STATS, incoming: Batch size:
> { Records: 32768, Total size: 20512768, Data size: 9175040, Gross row width: 
> 626, Net row width: 280, Density: 45% }
> An outgoing batch looks like this:
> 2018-06-14 19:28:10,911 [24dcdbc7-2f42-16a9-56f1-9cf58bc549bc:frag:5:0] DEBUG 
> o.a.d.e.p.i.p.ProjectRecordBatch - BATCH_STATS, outgoing: Batch size: { 
> Records: 1023, Total size: 11018240, Data size: 138105, Gross row width: 
> 10771, Net row width: 135, Density: 2% }
> The data size (138105) exceeds the maximum batch size (131072).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to