[jira] [Resolved] (SPARK-1443) Unable to Access MongoDB GridFS data with Spark using mongo-hadoop API

Matthew Farrellee (JIRA) Sun, 21 Sep 2014 07:01:20 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Matthew Farrellee resolved SPARK-1443.
--------------------------------------
       Resolution: Done
    Fix Version/s:     (was: 0.9.0)

> Unable to Access MongoDB GridFS data with Spark using mongo-hadoop API
> ----------------------------------------------------------------------
>
>                 Key: SPARK-1443
>                 URL: https://issues.apache.org/jira/browse/SPARK-1443
>             Project: Spark
>          Issue Type: Improvement
>          Components: Input/Output, Java API, Spark Core
>    Affects Versions: 0.9.0
>         Environment: Java 1.7,Hadoop 2.2.0,Spark 0.9.0,Ubuntu 12.4,
>            Reporter: Pavan Kumar Varma
>            Priority: Critical
>              Labels: GridFS, MongoDB, Spark, hadoop2, java
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> I saved a 2GB pdf file into MongoDB using GridFS. now i want process those 
> GridFS collection data using Java Spark Mapreduce API. previously i have 
> successfully processed mongoDB collections with Apache spark using 
> Mongo-Hadoop connector. now i'm unable to GridFS collections with the 
> following code.
> MongoConfigUtil.setInputURI(config, 
> "mongodb://localhost:27017/pdfbooks.fs.chunks" );
>  MongoConfigUtil.setOutputURI(config,"mongodb://localhost:27017/"+output );
>  JavaPairRDD<Object, BSONObject> mongoRDD = sc.newAPIHadoopRDD(config,
>             com.mongodb.hadoop.MongoInputFormat.class, Object.class,
>             BSONObject.class);
>  JavaRDD<String> words = mongoRDD.flatMap(new 
> FlatMapFunction<Tuple2<Object,BSONObject>,
>    String>() {                                
>    @Override
>    public Iterable<String> call(Tuple2<Object, BSONObject> arg) {   
>    System.out.println(arg._2.toString());
>    ...
> Please suggest/provide  better API methods to access MongoDB GridFS data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (SPARK-1443) Unable to Access MongoDB GridFS data with Spark using mongo-hadoop API

Reply via email to