[ 
https://issues.apache.org/jira/browse/BEAM-8089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16916033#comment-16916033
 ] 

Chamikara Jayalath commented on BEAM-8089:
------------------------------------------

True, seems like this is supported in a limited way (wildcards not supported 
for example).

 

I think Beam will have a hard time supporting this since most Beam runners are 
distributed and use multiple nodes to write data (to files) in parallel. So 
there's no "single" local disk. This is why we use a distributed storage 
location to which all workers have access to write individual files (a 
directory in GCS in this case) and execute a single BQ load job for all files 
from there.

> Error while using customGcsTempLocation() with Dataflow
> -------------------------------------------------------
>
>                 Key: BEAM-8089
>                 URL: https://issues.apache.org/jira/browse/BEAM-8089
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-gcp
>    Affects Versions: 2.13.0
>            Reporter: Harshit Dwivedi
>            Assignee: Chamikara Jayalath
>            Priority: Major
>
> I have the following code snippet which writes content to BigQuery via File 
> Loads.
> Currently the files are being written to a GCS Bucket, but I want to write 
> them to the local file storage of Dataflow instead and want BigQuery to load 
> data from there.
>  
>  
>  
> {code:java}
> BigQueryIO
>  .writeTableRows()
>  .withNumFileShards(100)
>  .withTriggeringFrequency(Duration.standardSeconds(90))
>  .withMethod(BigQueryIO.Write.Method.FILE_LOADS)
>  .withSchema(getSchema())
>  .withoutValidation()
>  .withCustomGcsTempLocation(new ValueProvider<String>() {
>     @Override
>     public String get(){
>          return "/home/harshit/testFiles";     
>     }
>     @Override
>     public boolean isAccessible(){
>          return true;     
>     }})
>  .withTimePartitioning(new TimePartitioning().setType("DAY"))
>  .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
>  .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
>  .to(tableName));
> {code}
>  
>  
> On running this, I don't see any files being written to the provided path and 
> the BQ load jobs fail with an IOException.
>  
> I looked at the docs, but I was unable to find any working example for this.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to