yangcao created KYLIN-3428:
------------------------------
Summary: java.lang.OutOfMemoryError: Requested array size exceeds
VM limit
Key: KYLIN-3428
URL: https://issues.apache.org/jira/browse/KYLIN-3428
Project: Kylin
Issue Type: Bug
Components: Job Engine
Affects Versions: v2.4.0, v2.3.1, v2.3.0, v2.2.0, v2.1.0
Environment: kylin v2.2.0 jdk7
Reporter: yangcao
LOG:
2018-06-26 15:50:24,032 INFO [main] org.apache.kylin.dict.DictionaryManager:
DictionaryManager(1499050426) loading DictionaryInfo(loadDictObj:true) at
/dict/xxx.xxx/C7/036b7ca0-8733-4c0c-99f5-5122919fd3dd.dict 2018-06-26
15:50:25,586 ERROR [main] org.apache.kylin.engine.mr.KylinMapper:
com.google.common.util.concurrent.ExecutionError: java.lang.OutOfMemoryError:
Requested array size exceeds VM limit at
com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2232) at
com.google.common.cache.LocalCache.get(LocalCache.java:3965) at
com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969) at
com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829)
at
org.apache.kylin.dict.DictionaryManager.getDictionaryInfo(DictionaryManager.java:118)
at org.apache.kylin.cube.CubeManager.getDictionary(CubeManager.java:271) at
org.apache.kylin.cube.CubeSegment.getDictionary(CubeSegment.java:320) at
org.apache.kylin.cube.kv.CubeDimEncMap.getDictionary(CubeDimEncMap.java:86) at
org.apache.kylin.cube.kv.CubeDimEncMap.get(CubeDimEncMap.java:65) at
org.apache.kylin.cube.kv.RowKeyColumnIO.getColumnLength(RowKeyColumnIO.java:43)
at org.apache.kylin.cube.kv.RowKeyEncoder.<init>(RowKeyEncoder.java:59) at
org.apache.kylin.cube.kv.AbstractRowKeyEncoder.createInstance(AbstractRowKeyEncoder.java:48)
at
org.apache.kylin.engine.mr.common.BaseCuboidBuilder.<init>(BaseCuboidBuilder.java:84)
at
org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.doSetup(BaseCuboidMapperBase.java:70)
at
org.apache.kylin.engine.mr.steps.HiveToBaseCuboidMapper.doSetup(HiveToBaseCuboidMapper.java:36)
at org.apache.kylin.engine.mr.KylinMapper.setup(KylinMapper.java:48) at
org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1707)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by:
java.lang.OutOfMemoryError: Requested array size exceeds VM limit at
java.util.Arrays.copyOf(Arrays.java:2271) at
java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at
java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at
org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1793) at
org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1769) at
org.apache.commons.io.IOUtils.copy(IOUtils.java:1744) at
org.apache.kylin.common.persistence.FileResourceStore.getResourceImpl(FileResourceStore.java:123)
at
org.apache.kylin.common.persistence.ResourceStore.getResource(ResourceStore.java:154)
at org.apache.kylin.dict.DictionaryManager.load(DictionaryManager.java:418) at
org.apache.kylin.dict.DictionaryManager$1.load(DictionaryManager.java:101) at
org.apache.kylin.dict.DictionaryManager$1.load(DictionaryManager.java:98) at
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
at
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228) at
com.google.common.cache.LocalCache.get(LocalCache.java:3965) at
com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969) at
com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829)
at
org.apache.kylin.dict.DictionaryManager.getDictionaryInfo(DictionaryManager.java:118)
at org.apache.kylin.cube.CubeManager.getDictionary(CubeManager.java:271) at
org.apache.kylin.cube.CubeSegment.getDictionary(CubeSegment.java:320) at
org.apache.kylin.cube.kv.CubeDimEncMap.getDictionary(CubeDimEncMap.java:86) at
org.apache.kylin.cube.kv.CubeDimEncMap.get(CubeDimEncMap.java:65) at
org.apache.kylin.cube.kv.RowKeyColumnIO.getColumnLength(RowKeyColumnIO.java:43)
at org.apache.kylin.cube.kv.RowKeyEncoder.<init>(RowKeyEncoder.java:59) at
org.apache.kylin.cube.kv.AbstractRowKeyEncoder.createInstance(AbstractRowKeyEncoder.java:48)
at
org.apache.kylin.engine.mr.common.BaseCuboidBuilder.<init>(BaseCuboidBuilder.java:84)
at
org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.doSetup(BaseCuboidMapperBase.java:70)
at
org.apache.kylin.engine.mr.steps.HiveToBaseCuboidMapper.doSetup(HiveToBaseCuboidMapper.java:36)
at org.apache.kylin.engine.mr.KylinMapper.setup(KylinMapper.java:48) at
org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
原因分析:
# C7是个高基数维度,字段平均字节较长,字典文件字节长度:1085484823 ;
# kylin load字典文件的实现见
FileResourceStore.getResourceImpl()方法,ByteArrayOutputStream的初始容量为1000,在copy时会不断扩容,逻辑如下(每次最少扩容2倍,最大值Integer.MAX_VALUE):
private void grow(int minCapacity) {
// overflow-conscious code
int oldCapacity = buf.length;
int newCapacity = oldCapacity << 1;
if (newCapacity - minCapacity < 0)
newCapacity = minCapacity;
if (newCapacity < 0) {
if (minCapacity < 0) // overflow
throw new OutOfMemoryError();
newCapacity = Integer.MAX_VALUE;
}
buf = Arrays.copyOf(buf, newCapacity);
}
3. JVM数组对数组长度有限制,不同环境上限可能不一样,可以通过 byte[] bytes = new byte[length]
测的具体是多少,一般是Integer.MAX_VALUE - 2。
修复建议:
ByteArrayOutputStream初始容量设置为文件字节长度,避免扩容。
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)