[
https://issues.apache.org/jira/browse/SPARK-22666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16360402#comment-16360402
]
xubo245 commented on SPARK-22666:
---------------------------------
Did it finished? or TODO
> Spark reader source for image format
> ------------------------------------
>
> Key: SPARK-22666
> URL: https://issues.apache.org/jira/browse/SPARK-22666
> Project: Spark
> Issue Type: Improvement
> Components: ML
> Affects Versions: 2.3.0
> Reporter: Timothy Hunter
> Priority: Major
>
> The current API for the new image format is implemented as a standalone
> feature, in order to make it reside within the mllib package. As discussed in
> SPARK-21866, users should be able to load images through the more common
> spark source reader interface.
> This ticket is concerned with adding image reading support in the spark
> source API, through either of the following interfaces:
> - {{spark.read.format("image")...}}
> - {{spark.read.image....}}
> The output is a dataframe that contains images (and the file names for
> example), following the semantics discussed already in SPARK-21866.
> A few technical notes:
> * since the functionality is implemented in {{mllib}}, calling this function
> may fail at runtime if users have not imported the {{spark-mllib}} dependency
> * How to deal with very flat directories? It is common to have millions of
> files in a single "directory" (like in S3), which seems to have caused some
> issues to some users. If this issue is too complex to handle in this ticket,
> it can be dealt with separately.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]