[
https://issues.apache.org/jira/browse/IMPALA-13764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17927907#comment-17927907
]
Quanlong Huang commented on IMPALA-13764:
-----------------------------------------
A way to reproduce this is creating a partitioned table (e.g. 10k partitions)
and using it many times in a query.
I used the following script to create the dirs and files locally:
{code:bash}
NUM_PARTS=10000
DIR_NAME="parts_10k"
mkdir -p $DIR_NAME
pushd $DIR_NAME
echo 1 > data.txt
for i in `seq $NUM_PARTS`; do
mkdir "p=$i"
cp data.txt "p=$i/"
done{code}
Then create an external table for it:
{code:java}
impala> create table parts_10k(i int) partitioned by (p int) stored as
textfile;{code}
Upload files to HDFS:
{noformat}
hdfs dfs -rmdir hdfs://localhost:20500/test-warehouse/parts_10k
hdfs dfs -put parts_10k hdfs://localhost:20500/test-warehouse/ {noformat}
Prepare a view by using the table many times.
{code:sql}
alter table parts_10k recover partitions;
create view parts_100k as select * from parts_10k union all select * from
parts_10k union all select * from parts_10k union all select * from parts_10k
union all select * from parts_10k union all select * from parts_10k union all
select * from parts_10k union all select * from parts_10k union all select *
from parts_10k union all select * from parts_10k;
create view parts_1m as select * from parts_100k union all select * from
parts_100k union all select * from parts_100k union all select * from
parts_100k union all select * from parts_100k union all select * from
parts_100k union all select * from parts_100k union all select * from
parts_100k union all select * from parts_100k union all select * from
parts_100k;
create view parts_10m_v as select * from parts_1m union all select * from
parts_1m union all select * from parts_1m union all select * from parts_1m
union all select * from parts_1m union all select * from parts_1m union all
select * from parts_1m union all select * from parts_1m union all select * from
parts_1m union all select * from parts_1m;
{code}
The following query will fail due to OOM of exceeding array limit:
{code:sql}
select count(*) from (select * from parts_10m_v union all select * from
parts_10m_v)t;{code}
> JniFrontend.createExecRequest hits OOM of JVM array limit
> ---------------------------------------------------------
>
> Key: IMPALA-13764
> URL: https://issues.apache.org/jira/browse/IMPALA-13764
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Critical
>
> When there are lots of files to scan in a query, the TExecRequest might need
> a huge byte array to serialize which could hit OOM of exceeding the JVM array
> limit (2GB):
> {noformat}
> I0215 09:22:04.852778 322082 jni-util.cc:321]
> b04ef5f1a668be58:5831dd9200000000] java.lang.OutOfMemoryError: Requested
> array size exceeds VM limit
> at java.util.Arrays.copyOf(Arrays.java:3236)
> at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
> at
> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
> at
> org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:197)
> at
> org.apache.thrift.protocol.TBinaryProtocol.writeI32(TBinaryProtocol.java:204)
> at
> org.apache.impala.thrift.TScanRangeLocation$TScanRangeLocationStandardScheme.write(TScanRangeLocation.java:503)
> at
> org.apache.impala.thrift.TScanRangeLocation$TScanRangeLocationStandardScheme.write(TScanRangeLocation.java:448)
> at
> org.apache.impala.thrift.TScanRangeLocation.write(TScanRangeLocation.java:391)
> at
> org.apache.impala.thrift.TScanRangeLocationList$TScanRangeLocationListStandardScheme.write(TScanRangeLocationList.java:468)
> at
> org.apache.impala.thrift.TScanRangeLocationList$TScanRangeLocationListStandardScheme.write(TScanRangeLocationList.java:402)
> at
> org.apache.impala.thrift.TScanRangeLocationList.write(TScanRangeLocationList.java:342)
> at
> org.apache.impala.thrift.TScanRangeSpec$TScanRangeSpecStandardScheme.write(TScanRangeSpec.java:485)
> at
> org.apache.impala.thrift.TScanRangeSpec$TScanRangeSpecStandardScheme.write(TScanRangeSpec.java:413)
> at
> org.apache.impala.thrift.TScanRangeSpec.write(TScanRangeSpec.java:355)
> at
> org.apache.impala.thrift.TPlanExecInfo$TPlanExecInfoStandardScheme.write(TPlanExecInfo.java:512)
> at
> org.apache.impala.thrift.TPlanExecInfo$TPlanExecInfoStandardScheme.write(TPlanExecInfo.java:425)
> at org.apache.impala.thrift.TPlanExecInfo.write(TPlanExecInfo.java:366)
> at
> org.apache.impala.thrift.TQueryExecRequest$TQueryExecRequestStandardScheme.write(TQueryExecRequest.java:1915)
> at
> org.apache.impala.thrift.TQueryExecRequest$TQueryExecRequestStandardScheme.write(TQueryExecRequest.java:1711)
> at
> org.apache.impala.thrift.TQueryExecRequest.write(TQueryExecRequest.java:1516)
> at
> org.apache.impala.thrift.TExecRequest$TExecRequestStandardScheme.write(TExecRequest.java:2920)
> at
> org.apache.impala.thrift.TExecRequest$TExecRequestStandardScheme.write(TExecRequest.java:2567)
> at org.apache.impala.thrift.TExecRequest.write(TExecRequest.java:2240)
> at org.apache.thrift.TSerializer.serialize(TSerializer.java:84)
> at
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:180){noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]