Jim Huang created SPARK-31276:
---------------------------------

             Summary: Contrived working example that works with multiple URI 
file storages for Spark cluster mode
                 Key: SPARK-31276
                 URL: https://issues.apache.org/jira/browse/SPARK-31276
             Project: Spark
          Issue Type: Wish
          Components: Examples
    Affects Versions: 2.4.5
            Reporter: Jim Huang


This Spark SQL Guide --> Data sources --> Generic Load/Save Functions

[https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html]

described a very simple "local file system load of an example file".  

 

I am looking for an example that demonstrates a workflow that exercises 
different file systems.  For example, 
 # Driver loads an input file from local file system
 # Add a simple column using lit() and stores that DataFrame in cluster mode to 
HDFS
 # Write that same final DataFrame back to Driver's local file system

 

The examples I found on the internet only uses simple paths without the 
explicit URI prefixes.

Without the explicit URI prefixes, the "filepath" inherits how Spark (mode) was 
called, local stand alone vs cluster mode.   So a "filepath" will be read/write 
locally (file system) vs cluster mode HDFS, without these explicit URIs.

There are situations were a Spark program needs to deal with both local file 
system and cluster mode (big data) in the same Spark application, like 
producing a summary table stored on the local file system of the driver at the 
end.  

Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to