The error you provided hints that pySpark seems to read pickle files as 
sequence files but are written as simple pickle files without having 
sequencefile format in mind.

I’m no pySpark expert, but I suggest you look into loading the pickle files as 
binary file and deserialize at custom code.
https://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.SparkContext.binaryFiles
 
<https://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.SparkContext.binaryFiles>

Then you should be able to deserialize the records and flat map the results to 
get RDD[YourType].

Best Regards

Roland Johann
Software Developer/Data Engineer

phenetic GmbH
Lütticher Straße 10, 50674 Köln, Germany

Mobil: +49 172 365 26 46
Mail: roland.joh...@phenetic.io
Web: phenetic.io

Handelsregister: Amtsgericht Köln (HRB 92595)
Geschäftsführer: Roland Johann, Uwe Reimann



> Am 26.08.2019 um 07:23 schrieb hxngillani <f2017279...@umt.edu.pk>:
> 
> Hello  Dear Members 
> i want to train model using Bigdl, i have data set of Medical images in the
> form of pickle object files (,pck).that pickle file is a 3D image(3D array) 
> 
> i have tried 
> pickleRdd = sc.pickleFilehome/student/BigDL- 
> trainings/elephantscale/data/volumetric_data/329637-8.pck
> sqlContext = SQLContext(sc)
> df = sqlContext.createDataFrame(pickleRdd)
> 
> this code throwing and error 
> Caused by: java.io.IOException:
> file:/home/student/BigDL-trainings/elephantscale/data/volumetric_data/329637-8.pck
> not a SequenceFile
> 
> 
> the things i came to know is that 
> The function
> sc.pickleFile
> loads a pickle file that is created by 
> rdd.saveAsPickleFile
> 
> I am loading a pickle file that is created by Python's "pickle" module  
> My Question is that  is there any way to load that file in spark data frame 
> 
> 
> 
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> 

Reply via email to