[ 
https://issues.apache.org/jira/browse/BEAM-4225?focusedWorklogId=115475&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-115475
 ]

ASF GitHub Bot logged work on BEAM-4225:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 25/Jun/18 15:45
            Start Date: 25/Jun/18 15:45
    Worklog Time Spent: 10m 
      Work Description: lukecwik commented on issue #4976: [BEAM-4225] Add 
Nexmark PostCommit runs for spark, flink and direct runner and export to 
Bigquery
URL: https://github.com/apache/beam/pull/4976#issuecomment-399999193
 
 
   BigQuery relies on the gcpTempLocation pipeline option.
   If gcpTempLocation is unset, it will use tempLocation.
   gcpTempLocation must be a google cloud storage location.
   
   If you want/need tempLocation to differ from gcpTempLocation, you should be
   able to set both properties.
   
   BigQueryIO performs a read by having BigQuery export the contents of a
   table or query to a set of files. BigQuery requires those files to exist in
   Google Cloud Storage. Having BigQuery export the data allows for a much
   faster read since we can parallelize the read of the files using dynamic
   work rebalancing. It also gives us a stable view of the table/query. The
   pipeline kicks off the BigQuery export, waits for the export to finish,
   reads those files and then deletes them.
   
   BigQueryIO has two ways in which it deals with writing to it, the first is
   where it imports data from avro files. BigQuery relies on the files being
   materialized to a Google Cloud Storage location. The pipeline materializes
   these files, kicks off an import job in BigQuery, waits for the import to
   complete and then deletes the temporary files. The second method is a
   streaming inserts mode where it calls the BigQuery insert API for each
   batch of records. Using the file import mode is significantly faster which
   is why it is used if possible.
   
   On Mon, Jun 25, 2018 at 7:15 AM Kenn Knowles <[email protected]>
   wrote:
   
   > It should default to tempLocation which is not specific to any runner or
   > platform. That error message does not occur in our code base. My guess: you
   > are writing to BigQuery, which works by writing Avro files to GCS and then
   > bulk importing the data.
   >
   > —
   > You are receiving this because you are subscribed to this thread.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/beam/pull/4976#issuecomment-399967175>, or mute
   > the thread
   > 
<https://github.com/notifications/unsubscribe-auth/AJnK7JRXVhw3Vj7EHlDOb8f8CnHr_23Eks5uAPCAgaJpZM4TAiEk>
   > .
   >
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 115475)
    Time Spent: 5h 20m  (was: 5h 10m)

> Integrate Nexmark with perfkit dashboards
> -----------------------------------------
>
>                 Key: BEAM-4225
>                 URL: https://issues.apache.org/jira/browse/BEAM-4225
>             Project: Beam
>          Issue Type: Improvement
>          Components: examples-nexmark
>            Reporter: Etienne Chauchot
>            Assignee: Etienne Chauchot
>            Priority: Major
>          Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> The aim is to run Nexmark as post-commits and export the results to the 
> performance dashboards:
> see the threads:
> [https://lists.apache.org/thread.html/9f8fe1c6df7d8bfe2697332e69722ca4edd2810adc6a914cdf32da29@%3Cdev.beam.apache.org%3E]
> https://lists.apache.org/thread.html/701196efd6e74b7715785d43019a4a73e8a093997f59662fdadf8f2a@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to