[ 
https://issues.apache.org/jira/browse/BEAM-11002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235596#comment-17235596
 ] 

Beam JIRA Bot commented on BEAM-11002:
--------------------------------------

This issue is assigned but has not received an update in 30 days so it has been 
labeled "stale-assigned". If you are still working on the issue, please give an 
update and remove the label. If you are no longer working on the issue, please 
unassign so someone else may work on it. In 7 days the issue will be 
automatically unassigned.

> XmlIO buffer overflow exception 
> --------------------------------
>
>                 Key: BEAM-11002
>                 URL: https://issues.apache.org/jira/browse/BEAM-11002
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-xml
>    Affects Versions: 2.23.0, 2.24.0
>            Reporter: Duncan Lew
>            Assignee: Chamikara Madhusanka Jayalath
>            Priority: P1
>              Labels: Clarified, stale-assigned
>
> We're making using of Apache Beam in Google Dataflow.
> We're using XmlIO to read in an XML file with such a setup
> {code:java}
> pipeline
>                     .apply("Read Storage Bucket",
>                             XmlIO.read<XmlProduct>()
>                                     .from(sourcePath)
>                                     .withRootElement(xmlProductRoot)
>                                     .withRecordElement(xmlProductRecord)
>                                     .withRecordClass(XmlProduct::class.java)
>                     )
> {code}
> However, from time to time, we're getting buffer overflow exception from 
> reading random xml files:
> {code:java}
> "Error message from worker: java.io.IOException: Failed to start reading from 
> source: gs://path-to-xml-file.xml range [1722550, 2684411)
>       
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:610)
>       
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:359)
>       
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194)
>       
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
>       
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
>       
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:417)
>       
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:386)
>       
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:311)
>       
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
>       
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
>       
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
>       java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>       
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>       
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>       java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.nio.BufferOverflowException
>       java.base/java.nio.Buffer.nextPutIndex(Buffer.java:662)
>       java.base/java.nio.HeapByteBuffer.put(HeapByteBuffer.java:196)
>       
> org.apache.beam.sdk.io.xml.XmlSource$XMLReader.getFirstOccurenceOfRecordElement(XmlSource.java:285)
>       
> org.apache.beam.sdk.io.xml.XmlSource$XMLReader.startReading(XmlSource.java:192)
>       
> org.apache.beam.sdk.io.FileBasedSource$FileBasedReader.startImpl(FileBasedSource.java:476)
>       
> org.apache.beam.sdk.io.OffsetBasedSource$OffsetBasedReader.start(OffsetBasedSource.java:249)
>       
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:607)
>       ... 14 more
> {code}
> We can't reproduce this buffer overflow exception locally with the 
> DirectRunner. If we rerun the dataflow job in the Google Cloud, it can run 
> correctly without any exceptions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to