Timothy Hunter created SPARK-22666:
--------------------------------------

             Summary: Spark reader source for image format
                 Key: SPARK-22666
                 URL: https://issues.apache.org/jira/browse/SPARK-22666
             Project: Spark
          Issue Type: Improvement
          Components: ML
    Affects Versions: 2.3.0
            Reporter: Timothy Hunter


The current API for the new image format is implemented as a standalone 
feature, in order to make it reside within the mllib package. As discussed in 
SPARK-21866, users should be able to load images through the more common spark 
source reader interface.

This ticket is concerned with adding image reading support in the spark source 
API, through either of the following interfaces:
 - {{spark.read.format("image")...}}
 - {{spark.read.image....}}
The output is a dataframe that contains images (and the file names for 
example), following the semantics discussed already in SPARK-21866.

A few technical notes:
* since the functionality is implemented in {{mllib}}, calling this function 
may fail at runtime if users have not imported the {{spark-mllib}} dependency
* How to deal with very flat directories? It is common to have millions of 
files in a single "directory" (like in S3), which seems to have caused some 
issues to some users. If this issue is too complex to handle in this ticket, it 
can be dealt with separately.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to