[ 
https://issues.apache.org/jira/browse/BEAM-3165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236943#comment-17236943
 ] 

Eugene Nikolaiev commented on BEAM-3165:
----------------------------------------

The MongoDB readers currently rely on standard hex _id field for bundles 
splitting using object id range tracker. So, no quick fix is possible. Either 
custom range trackers would need to be implemented, or (maybe) an option to 
disable splitting into bundles.

> Mongo document read with non hex objectid
> -----------------------------------------
>
>                 Key: BEAM-3165
>                 URL: https://issues.apache.org/jira/browse/BEAM-3165
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-mongodb
>    Affects Versions: 2.1.0
>            Reporter: Utkarsh Sopan
>            Priority: P3
>
> I have a mongo collection which has non-hex '_id' in form a string.
> I cant read them into a PCollection getting following exception
> Exception in thread "main" java.lang.IllegalArgumentException: invalid 
> hexadecimal representation of an ObjectId: [somestring]
>       at org.bson.types.ObjectId.parseHexString(ObjectId.java:523)
>       at org.bson.types.ObjectId.<init>(ObjectId.java:237)
>       at 
> org.bson.json.JsonReader.visitObjectIdConstructor(JsonReader.java:674)
>       at org.bson.json.JsonReader.readBsonType(JsonReader.java:197)
>       at org.bson.codecs.DocumentCodec.decode(DocumentCodec.java:139)
>       at org.bson.codecs.DocumentCodec.decode(DocumentCodec.java:45)
>       at org.bson.codecs.configuration.LazyCodec.decode(LazyCodec.java:47)
>       at org.bson.codecs.DocumentCodec.readValue(DocumentCodec.java:215)
>       at org.bson.codecs.DocumentCodec.decode(DocumentCodec.java:141)
>       at org.bson.codecs.DocumentCodec.decode(DocumentCodec.java:45)
>       at org.bson.codecs.DocumentCodec.readValue(DocumentCodec.java:215)
>       at org.bson.codecs.DocumentCodec.readList(DocumentCodec.java:222)
>       at org.bson.codecs.DocumentCodec.readValue(DocumentCodec.java:208)
>       at org.bson.codecs.DocumentCodec.decode(DocumentCodec.java:141)
>       at org.bson.codecs.DocumentCodec.decode(DocumentCodec.java:45)
>       at org.bson.Document.parse(Document.java:105)
>       at org.bson.Document.parse(Document.java:90)
>       at 
> org.apache.beam.sdk.io.mongodb.MongoDbIO$BoundedMongoDbReader.start(MongoDbIO.java:472)
>       at 
> org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$BoundedReadEvaluator.processElement(BoundedReadEvaluatorFactory.java:141)
>       at 
> org.apache.beam.runners.direct.TransformExecutor.processElements(TransformExecutor.java:146)
>       at 
> org.apache.beam.runners.direct.TransformExecutor.run(TransformExecutor.java:110)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to