[SPARK-8794] [SQL] PrunedScan problem

Eron Wright Thu, 02 Jul 2015 09:04:03 -0700

I filed an issue due to an issue I see with PrunedScan, that causes sub-optimal 
performance in ML pipelines.   
Sorry if the issue is already known.
Having tried a few approaches to working with large binary files with Spark ML, 
I prefer loading the data into a vector-type column from a relation supporting 
pruned scan.  This is better, I think, than a lazy-loading scheme based on 
binaryFiles/PortalDataStream.   SPARK-8794 undermines the approach.
Eron

[SPARK-8794] [SQL] PrunedScan problem

Reply via email to