[
https://issues.apache.org/jira/browse/SPARK-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Farrellee resolved SPARK-1443.
--------------------------------------
Resolution: Done
Fix Version/s: (was: 0.9.0)
> Unable to Access MongoDB GridFS data with Spark using mongo-hadoop API
> ----------------------------------------------------------------------
>
> Key: SPARK-1443
> URL: https://issues.apache.org/jira/browse/SPARK-1443
> Project: Spark
> Issue Type: Improvement
> Components: Input/Output, Java API, Spark Core
> Affects Versions: 0.9.0
> Environment: Java 1.7,Hadoop 2.2.0,Spark 0.9.0,Ubuntu 12.4,
> Reporter: Pavan Kumar Varma
> Priority: Critical
> Labels: GridFS, MongoDB, Spark, hadoop2, java
> Original Estimate: 12h
> Remaining Estimate: 12h
>
> I saved a 2GB pdf file into MongoDB using GridFS. now i want process those
> GridFS collection data using Java Spark Mapreduce API. previously i have
> successfully processed mongoDB collections with Apache spark using
> Mongo-Hadoop connector. now i'm unable to GridFS collections with the
> following code.
> MongoConfigUtil.setInputURI(config,
> "mongodb://localhost:27017/pdfbooks.fs.chunks" );
> MongoConfigUtil.setOutputURI(config,"mongodb://localhost:27017/"+output );
> JavaPairRDD<Object, BSONObject> mongoRDD = sc.newAPIHadoopRDD(config,
> com.mongodb.hadoop.MongoInputFormat.class, Object.class,
> BSONObject.class);
> JavaRDD<String> words = mongoRDD.flatMap(new
> FlatMapFunction<Tuple2<Object,BSONObject>,
> String>() {
> @Override
> public Iterable<String> call(Tuple2<Object, BSONObject> arg) {
> System.out.println(arg._2.toString());
> ...
> Please suggest/provide better API methods to access MongoDB GridFS data.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]