[
https://issues.apache.org/jira/browse/BEAM-4283?focusedWorklogId=109008&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-109008
]
ASF GitHub Bot logged work on BEAM-4283:
----------------------------------------
Author: ASF GitHub Bot
Created on: 05/Jun/18 09:52
Start Date: 05/Jun/18 09:52
Worklog Time Spent: 10m
Work Description: echauchot commented on a change in pull request #5464:
[BEAM-4283] Write Nexmark execution times to bigquery
URL: https://github.com/apache/beam/pull/5464#discussion_r193012811
##########
File path:
sdks/java/nexmark/src/main/java/org/apache/beam/sdk/nexmark/Main.java
##########
@@ -74,22 +96,89 @@ void runAll(OptionT options, NexmarkLauncher
nexmarkLauncher) throws IOException
appendPerf(options.getPerfFilename(), configuration, perf);
actual.put(configuration, perf);
// Summarize what we've run so far.
- saveSummary(null, configurations, actual, baseline, start);
+ saveSummary(null, configurations, actual, baseline, start, options);
}
}
+ if (options.getExportSummaryToBigQuery()){
+ savePerfsToBigQuery(options, actual, null);
+ }
} finally {
if (options.getMonitorJobs()) {
// Report overall performance.
- saveSummary(options.getSummaryFilename(), configurations, actual,
baseline, start);
+ saveSummary(options.getSummaryFilename(), configurations, actual,
baseline, start, options);
saveJavascript(options.getJavascriptFilename(), configurations,
actual, baseline, start);
}
}
-
if (!successful) {
throw new RuntimeException("Execution was not successful");
}
}
+ @VisibleForTesting
+ static void savePerfsToBigQuery(
+ NexmarkOptions options,
+ Map<NexmarkConfiguration, NexmarkPerf> perfs,
+ @Nullable FakeBigQueryServices fakeBigQueryServices) {
+ Pipeline pipeline = Pipeline.create(options);
Review comment:
Yes it is technically feasible to create a new PipelineOptions with runner
== DirectRunner and use it in the second pipeline. But where I'm not convinced
is that it will require to ship direct runner libs in nexmark even when we are
running the queries on another runner. It might be problematic to have them in
the fat jar deployed on a spark cluster for example.
Currently, we only ship direct runner libs in the classpath when we run JVM
local tests on the direct runner (profile).
Please note that the other big query additions to nexmark (sink the output
PCollection to big query) currently run with the same runner than the queries
in the same pipeline)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 109008)
Time Spent: 4.5h (was: 4h 20m)
> Export nexmark execution times to bigQuery
> ------------------------------------------
>
> Key: BEAM-4283
> URL: https://issues.apache.org/jira/browse/BEAM-4283
> Project: Beam
> Issue Type: Sub-task
> Components: examples-nexmark
> Reporter: Etienne Chauchot
> Assignee: Etienne Chauchot
> Priority: Major
> Time Spent: 4.5h
> Remaining Estimate: 0h
>
> Nexmark only outputs the results collection to bigQuery and prints in the
> console the execution times. To supervise Nexmark execution times, we need to
> store them as well per runner/query/mode
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)