[ 
https://issues.apache.org/jira/browse/IMPALA-13764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17927907#comment-17927907
 ] 

Quanlong Huang commented on IMPALA-13764:
-----------------------------------------

A way to reproduce this is creating a partitioned table (e.g. 10k partitions) 
and using it many times in a query.

I used the following script to create the dirs and files locally:
{code:bash}
NUM_PARTS=10000
DIR_NAME="parts_10k"

mkdir -p $DIR_NAME
pushd $DIR_NAME
echo 1 > data.txt
for i in `seq $NUM_PARTS`; do
  mkdir "p=$i"
  cp data.txt "p=$i/"
done{code}
Then create an external table for it:
{code:java}
impala> create table parts_10k(i int) partitioned by (p int) stored as 
textfile;{code}
Upload files to HDFS:
{noformat}
hdfs dfs -rmdir hdfs://localhost:20500/test-warehouse/parts_10k
hdfs dfs -put parts_10k hdfs://localhost:20500/test-warehouse/ {noformat}
Prepare a view by using the table many times.
{code:sql}
alter table parts_10k recover partitions;

create view parts_100k as select * from parts_10k union all select * from 
parts_10k union all select * from parts_10k union all select * from parts_10k 
union all select * from parts_10k union all select * from parts_10k union all 
select * from parts_10k union all select * from parts_10k union all select * 
from parts_10k union all select * from parts_10k;

create view parts_1m as select * from parts_100k union all select * from 
parts_100k union all select * from parts_100k union all select * from 
parts_100k union all select * from parts_100k union all select * from 
parts_100k union all select * from parts_100k union all select * from 
parts_100k union all select * from parts_100k union all select * from 
parts_100k;

create view parts_10m_v as select * from parts_1m union all select * from 
parts_1m union all select * from parts_1m union all select * from parts_1m 
union all select * from parts_1m union all select * from parts_1m union all 
select * from parts_1m union all select * from parts_1m union all select * from 
parts_1m union all select * from parts_1m; 
{code}
The following query will fail due to OOM of exceeding array limit:
{code:sql}
select count(*) from (select * from parts_10m_v union all select * from 
parts_10m_v)t;{code}

> JniFrontend.createExecRequest hits OOM of JVM array limit
> ---------------------------------------------------------
>
>                 Key: IMPALA-13764
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13764
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>
> When there are lots of files to scan in a query, the TExecRequest might need 
> a huge byte array to serialize which could hit OOM of exceeding the JVM array 
> limit (2GB):
> {noformat}
> I0215 09:22:04.852778 322082 jni-util.cc:321] 
> b04ef5f1a668be58:5831dd9200000000] java.lang.OutOfMemoryError: Requested 
> array size exceeds VM limit
>       at java.util.Arrays.copyOf(Arrays.java:3236)
>       at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
>       at 
> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
>       at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
>       at 
> org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:197)
>       at 
> org.apache.thrift.protocol.TBinaryProtocol.writeI32(TBinaryProtocol.java:204)
>       at 
> org.apache.impala.thrift.TScanRangeLocation$TScanRangeLocationStandardScheme.write(TScanRangeLocation.java:503)
>       at 
> org.apache.impala.thrift.TScanRangeLocation$TScanRangeLocationStandardScheme.write(TScanRangeLocation.java:448)
>       at 
> org.apache.impala.thrift.TScanRangeLocation.write(TScanRangeLocation.java:391)
>       at 
> org.apache.impala.thrift.TScanRangeLocationList$TScanRangeLocationListStandardScheme.write(TScanRangeLocationList.java:468)
>       at 
> org.apache.impala.thrift.TScanRangeLocationList$TScanRangeLocationListStandardScheme.write(TScanRangeLocationList.java:402)
>       at 
> org.apache.impala.thrift.TScanRangeLocationList.write(TScanRangeLocationList.java:342)
>       at 
> org.apache.impala.thrift.TScanRangeSpec$TScanRangeSpecStandardScheme.write(TScanRangeSpec.java:485)
>       at 
> org.apache.impala.thrift.TScanRangeSpec$TScanRangeSpecStandardScheme.write(TScanRangeSpec.java:413)
>       at 
> org.apache.impala.thrift.TScanRangeSpec.write(TScanRangeSpec.java:355)
>       at 
> org.apache.impala.thrift.TPlanExecInfo$TPlanExecInfoStandardScheme.write(TPlanExecInfo.java:512)
>       at 
> org.apache.impala.thrift.TPlanExecInfo$TPlanExecInfoStandardScheme.write(TPlanExecInfo.java:425)
>       at org.apache.impala.thrift.TPlanExecInfo.write(TPlanExecInfo.java:366)
>       at 
> org.apache.impala.thrift.TQueryExecRequest$TQueryExecRequestStandardScheme.write(TQueryExecRequest.java:1915)
>       at 
> org.apache.impala.thrift.TQueryExecRequest$TQueryExecRequestStandardScheme.write(TQueryExecRequest.java:1711)
>       at 
> org.apache.impala.thrift.TQueryExecRequest.write(TQueryExecRequest.java:1516)
>       at 
> org.apache.impala.thrift.TExecRequest$TExecRequestStandardScheme.write(TExecRequest.java:2920)
>       at 
> org.apache.impala.thrift.TExecRequest$TExecRequestStandardScheme.write(TExecRequest.java:2567)
>       at org.apache.impala.thrift.TExecRequest.write(TExecRequest.java:2240)
>       at org.apache.thrift.TSerializer.serialize(TSerializer.java:84)
>       at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:180){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to