[ https://issues.apache.org/jira/browse/SPARK-22666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511745#comment-16511745 ]
Joseph K. Bradley commented on SPARK-22666: ------------------------------------------- Side note: The Java library we use for reading images does not support as many image types as PIL, which we used in the image loader in https://github.com/databricks/spark-deep-learning This should not block this task, but it'd be worth poking around to see if there are alternative libraries we could use from Java. > Spark datasource for image format > --------------------------------- > > Key: SPARK-22666 > URL: https://issues.apache.org/jira/browse/SPARK-22666 > Project: Spark > Issue Type: Improvement > Components: ML > Affects Versions: 2.3.0 > Reporter: Timothy Hunter > Priority: Major > > The current API for the new image format is implemented as a standalone > feature, in order to make it reside within the mllib package. As discussed in > SPARK-21866, users should be able to load images through the more common > spark source reader interface. > This ticket is concerned with adding image reading support in the spark > source API, through either of the following interfaces: > - {{spark.read.format("image")...}} > - {{spark.read.image....}} > The output is a dataframe that contains images (and the file names for > example), following the semantics discussed already in SPARK-21866. > A few technical notes: > * since the functionality is implemented in {{mllib}}, calling this function > may fail at runtime if users have not imported the {{spark-mllib}} dependency > * How to deal with very flat directories? It is common to have millions of > files in a single "directory" (like in S3), which seems to have caused some > issues to some users. If this issue is too complex to handle in this ticket, > it can be dealt with separately. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org