[jira] [Updated] (SPARK-20353) Implement Tensorflow TFRecords file format

Sean Owen (JIRA) Mon, 17 Apr 2017 02:17:35 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-20353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sean Owen updated SPARK-20353:
------------------------------
    Priority: Minor  (was: Major)

I think this is too app-specific to live in Spark, and should just be in a 
third-party library.

> Implement Tensorflow TFRecords file format
> ------------------------------------------
>
>                 Key: SPARK-20353
>                 URL: https://issues.apache.org/jira/browse/SPARK-20353
>             Project: Spark
>          Issue Type: Improvement
>          Components: Input/Output, SQL
>    Affects Versions: 2.1.0
>            Reporter: Mathew Wicks
>            Priority: Minor
>
> Spark is a very good prepossessing engine for tools like Tensorflow. However, 
> we lack native support for Tensorflow's core file format, TFRecords.
> There is a project which implements this functionality as an external JAR. 
> (But is not user friendly, or robust enough for production use.)
> https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow-connector
> Here is some discussion around the above.
> https://github.com/tensorflow/ecosystem/issues/32
> If we were to implement "tfrecords" as a data-frame writable/readable format, 
> we would have to account for the various datatypes that can be present in 
> spark columns, and which ones are actually useful in Tensorflow. 
> Note: The `spark-tensorflow-connector` described above, does not properly 
> support the vector data type. 
> Further discussion of whether this is within the scope of Spark SQL is 
> strongly welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-20353) Implement Tensorflow TFRecords file format

Reply via email to