[ 
https://issues.apache.org/jira/browse/SPARK-22666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511745#comment-16511745
 ] 

Joseph K. Bradley commented on SPARK-22666:
-------------------------------------------

Side note: The Java library we use for reading images does not support as many 
image types as PIL, which we used in the image loader in 
https://github.com/databricks/spark-deep-learning  This should not block this 
task, but it'd be worth poking around to see if there are alternative libraries 
we could use from Java.

> Spark datasource for image format
> ---------------------------------
>
>                 Key: SPARK-22666
>                 URL: https://issues.apache.org/jira/browse/SPARK-22666
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 2.3.0
>            Reporter: Timothy Hunter
>            Priority: Major
>
> The current API for the new image format is implemented as a standalone 
> feature, in order to make it reside within the mllib package. As discussed in 
> SPARK-21866, users should be able to load images through the more common 
> spark source reader interface.
> This ticket is concerned with adding image reading support in the spark 
> source API, through either of the following interfaces:
>  - {{spark.read.format("image")...}}
>  - {{spark.read.image....}}
> The output is a dataframe that contains images (and the file names for 
> example), following the semantics discussed already in SPARK-21866.
> A few technical notes:
> * since the functionality is implemented in {{mllib}}, calling this function 
> may fail at runtime if users have not imported the {{spark-mllib}} dependency
> * How to deal with very flat directories? It is common to have millions of 
> files in a single "directory" (like in S3), which seems to have caused some 
> issues to some users. If this issue is too complex to handle in this ticket, 
> it can be dealt with separately.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to