See <https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/1580/display/redirect?page=changes>
Changes: [echauchot] [BEAM-8470] Add an empty spark-structured-streaming runner project [echauchot] [BEAM-8470] Fix missing dep [echauchot] [BEAM-8470] Add SparkPipelineOptions [echauchot] [BEAM-8470] Start pipeline translation [echauchot] [BEAM-8470] Add global pipeline translation structure [echauchot] [BEAM-8470] Add nodes translators structure [echauchot] [BEAM-8470] Wire node translators with pipeline translator [echauchot] [BEAM-8470] Renames: better differenciate pipeline translator for [echauchot] [BEAM-8470] Organise methods in PipelineTranslator [echauchot] [BEAM-8470] Initialise BatchTranslationContext [echauchot] [BEAM-8470] Refactoring: -move batch/streaming common translation [echauchot] [BEAM-8470] Make transform translation clearer: renaming, comments [echauchot] [BEAM-8470] Improve javadocs [echauchot] [BEAM-8470] Move SparkTransformOverrides to correct package [echauchot] [BEAM-8470] Move common translation context components to superclass [echauchot] [BEAM-8470] apply spotless [echauchot] [BEAM-8470] Make codestyle and firebug happy [echauchot] [BEAM-8470] Add TODOs [echauchot] [BEAM-8470] Post-pone batch qualifier in all classes names for [echauchot] [BEAM-8470] Add precise TODO for multiple TransformTranslator per [echauchot] [BEAM-8470] Added SparkRunnerRegistrar [echauchot] [BEAM-8470] Add basic pipeline execution. Refactor translatePipeline() [echauchot] [BEAM-8470] Create PCollections manipulation methods [echauchot] [BEAM-8470] Create Datasets manipulation methods [echauchot] [BEAM-8470] Add Flatten transformation translator [echauchot] [BEAM-8470] Add primitive GroupByKeyTranslatorBatch implementation [echauchot] [BEAM-8470] Use Iterators.transform() to return Iterable [echauchot] [BEAM-8470] Implement read transform [echauchot] [BEAM-8470] update TODO [echauchot] [BEAM-8470] Apply spotless [echauchot] [BEAM-8470] start source instanciation [echauchot] [BEAM-8470] Improve exception flow [echauchot] [BEAM-8470] Improve type enforcement in ReadSourceTranslator [echauchot] [BEAM-8470] Experiment over using spark Catalog to pass in Beam Source [echauchot] [BEAM-8470] Add source mocks [echauchot] [BEAM-8470] fix mock, wire mock in translators and create a main test. [echauchot] [BEAM-8470] Use raw WindowedValue so that spark Encoders could work [echauchot] [BEAM-8470] clean deps [echauchot] [BEAM-8470] Move DatasetSourceMock to proper batch mode [echauchot] [BEAM-8470] Run pipeline in batch mode or in streaming mode [echauchot] [BEAM-8470] Split batch and streaming sources and translators [echauchot] [BEAM-8470] Use raw Encoder<WindowedValue> also in regular [echauchot] [BEAM-8470] Clean [echauchot] [BEAM-8470] Add ReadSourceTranslatorStreaming [echauchot] [BEAM-8470] Move Source and translator mocks to a mock package. [echauchot] [BEAM-8470] Pass Beam Source and PipelineOptions to the spark DataSource [echauchot] [BEAM-8470] Refactor DatasetSource fields [echauchot] [BEAM-8470] Wire real SourceTransform and not mock and update the test [echauchot] [BEAM-8470] Add missing 0-arg public constructor [echauchot] [BEAM-8470] Use new PipelineOptionsSerializationUtils [echauchot] [BEAM-8470] Apply spotless and fix checkstyle [echauchot] [BEAM-8470] Add a dummy schema for reader [echauchot] [BEAM-8470] Add empty 0-arg constructor for mock source [echauchot] [BEAM-8470] Clean [echauchot] [BEAM-8470] Checkstyle and Findbugs [echauchot] [BEAM-8470] Refactor SourceTest to a UTest instaed of a main [echauchot] [BEAM-8470] Fix pipeline triggering: use a spark action instead of [echauchot] [BEAM-8470] improve readability of options passing to the source [echauchot] [BEAM-8470] Clean unneeded fields in DatasetReader [echauchot] [BEAM-8470] Fix serialization issues [echauchot] [BEAM-8470] Add SerializationDebugger [echauchot] [BEAM-8470] Add serialization test [echauchot] [BEAM-8470] Move SourceTest to same package as tested class [echauchot] [BEAM-8470] Fix SourceTest [echauchot] [BEAM-8470] Simplify beam reader creation as it created once the source [echauchot] [BEAM-8470] Put all transform translators Serializable [echauchot] [BEAM-8470] Enable test mode [echauchot] [BEAM-8470] Enable gradle build scan [echauchot] [BEAM-8470] Add flatten test [echauchot] [BEAM-8470] First attempt for ParDo primitive implementation [echauchot] [BEAM-8470] Serialize windowedValue to byte[] in source to be able to [echauchot] [BEAM-8470] Comment schema choices [echauchot] [BEAM-8470] Fix errorprone [echauchot] [BEAM-8470] Fix testMode output to comply with new binary schema [echauchot] [BEAM-8470] Cleaning [echauchot] [BEAM-8470] Remove bundleSize parameter and always use spark default [echauchot] [BEAM-8470] Fix split bug [echauchot] [BEAM-8470] Clean [echauchot] [BEAM-8470] Add ParDoTest [echauchot] [BEAM-8470] Address minor review notes [echauchot] [BEAM-8470] Clean [echauchot] [BEAM-8470] Add GroupByKeyTest [echauchot] [BEAM-8470] Add comments and TODO to GroupByKeyTranslatorBatch [echauchot] [BEAM-8470] Fix type checking with Encoder of WindowedValue<T> [echauchot] [BEAM-8470] Port latest changes of ReadSourceTranslatorBatch to [echauchot] [BEAM-8470] Remove no more needed putDatasetRaw [echauchot] [BEAM-8470] Add ComplexSourceTest [echauchot] [BEAM-8470] Fail in case of having SideInouts or State/Timers [echauchot] [BEAM-8470] Fix Encoders: create an Encoder for every manipulated type [echauchot] [BEAM-8470] Apply spotless [echauchot] [BEAM-8470] Fixed Javadoc error [echauchot] [BEAM-8470] Rename SparkSideInputReader class and rename pruneOutput() [echauchot] [BEAM-8470] Don't use deprecated [echauchot] [BEAM-8470] Simplify logic of ParDo translator [echauchot] [BEAM-8470] Fix kryo issue in GBK translator with a workaround [echauchot] [BEAM-8470] Rename SparkOutputManager for consistency [echauchot] [BEAM-8470] Fix for test elements container in GroupByKeyTest [echauchot] [BEAM-8470] Added "testTwoPardoInRow" [echauchot] [BEAM-8470] Add a test for the most simple possible Combine [echauchot] [BEAM-8470] Rename SparkDoFnFilterFunction to DoFnFilterFunction for [echauchot] [BEAM-8470] Generalize the use of SerializablePipelineOptions in place [echauchot] [BEAM-8470] Fix getSideInputs [echauchot] [BEAM-8470] Extract binary schema creation in a helper class [echauchot] [BEAM-8470] First version of combinePerKey [echauchot] [BEAM-8470] Improve type checking of Tuple2 encoder [echauchot] [BEAM-8470] Introduce WindowingHelpers (and helpers package) and use it [echauchot] [BEAM-8470] Fix combiner using KV as input, use binary encoders in place [echauchot] [BEAM-8470] Add combinePerKey and CombineGlobally tests [echauchot] [BEAM-8470] Introduce RowHelpers [echauchot] [BEAM-8470] Add CombineGlobally translation to avoid translating [echauchot] [BEAM-8470] Cleaning [echauchot] [BEAM-8470] Get back to classes in translators resolution because URNs [echauchot] [BEAM-8470] Fix various type checking issues in Combine.Globally [echauchot] [BEAM-8470] Update test with Long [echauchot] [BEAM-8470] Fix combine. For unknown reason GenericRowWithSchema is used [echauchot] [BEAM-8470] Use more generic Row instead of GenericRowWithSchema [echauchot] [BEAM-8470] Add explanation about receiving a Row as input in the [echauchot] [BEAM-8470] Fix encoder bug in combinePerkey [echauchot] [BEAM-8470] Cleaning [echauchot] [BEAM-8470] Implement WindowAssignTranslatorBatch [echauchot] [BEAM-8470] Implement WindowAssignTest [echauchot] [BEAM-8470] Fix javadoc [echauchot] [BEAM-8470] Added SideInput support [echauchot] [BEAM-8470] Fix CheckStyle violations [echauchot] [BEAM-8470] Don't use Reshuffle translation [echauchot] [BEAM-8470] Added using CachedSideInputReader [echauchot] [BEAM-8470] Added TODO comment for ReshuffleTranslatorBatch [echauchot] [BEAM-8470] And unchecked warning suppression [echauchot] [BEAM-8470] Add streaming source initialisation [echauchot] [BEAM-8470] Implement first streaming source [echauchot] [BEAM-8470] Add a TODO on spark output modes [echauchot] [BEAM-8470] Add transformators registry in PipelineTranslatorStreaming [echauchot] [BEAM-8470] Add source streaming test [echauchot] [BEAM-8470] Specify checkpointLocation at the pipeline start [echauchot] [BEAM-8470] Clean unneeded 0 arg constructor in batch source [echauchot] [BEAM-8470] Clean streaming source [echauchot] [BEAM-8470] Continue impl of offsets for streaming source [echauchot] [BEAM-8470] Deal with checkpoint and offset based read [echauchot] [BEAM-8470] Apply spotless and fix spotbugs warnings [echauchot] [BEAM-8470] Disable never ending test [echauchot] [BEAM-8470] Fix access level issues, typos and modernize code to Java 8 [echauchot] [BEAM-8470] Merge Spark Structured Streaming runner into main Spark [echauchot] [BEAM-8470] Fix non-vendored imports from Spark Streaming Runner classes [echauchot] [BEAM-8470] Pass doFnSchemaInformation to ParDo batch translation [echauchot] [BEAM-8470] Fix spotless issues after rebase [echauchot] [BEAM-8470] Fix logging levels in Spark Structured Streaming translation [echauchot] [BEAM-8470] Add SparkStructuredStreamingPipelineOptions and [echauchot] [BEAM-8470] Rename SparkPipelineResult to [echauchot] [BEAM-8470] Use PAssert in Spark Structured Streaming transform tests [echauchot] [BEAM-8470] Ignore spark offsets (cf javadoc) [echauchot] [BEAM-8470] implement source.stop [echauchot] [BEAM-8470] Update javadoc [echauchot] [BEAM-8470] Apply Spotless [echauchot] [BEAM-8470] Enable batch Validates Runner tests for Structured Streaming [echauchot] [BEAM-8470] Limit the number of partitions to make tests go 300% faster [echauchot] [BEAM-8470] Fixes ParDo not calling setup and not tearing down if [echauchot] [BEAM-8470] Pass transform based doFnSchemaInformation in ParDo [echauchot] [BEAM-8470] Consider null object case on RowHelpers, fixes empty side [echauchot] [BEAM-8470] Put back batch/simpleSourceTest.testBoundedSource [echauchot] [BEAM-8470] Update windowAssignTest [echauchot] [BEAM-8470] Add comment about checkpoint mark [echauchot] [BEAM-8470] Re-code GroupByKeyTranslatorBatch to conserve windowing [echauchot] [BEAM-8470] re-enable reduceFnRunner timers for output [echauchot] [BEAM-8470] Improve visibility of debug messages [echauchot] [BEAM-8470] Add a test that GBK preserves windowing [echauchot] [BEAM-8470] Add TODO in Combine translations [echauchot] [BEAM-8470] Update KVHelpers.extractKey() to deal with WindowedValue and [echauchot] [BEAM-8470] Fix comment about schemas [echauchot] [BEAM-8470] Implement reduce part of CombineGlobally translation with [echauchot] [BEAM-8470] Output data after combine [echauchot] [BEAM-8470] Implement merge accumulators part of CombineGlobally [echauchot] [BEAM-8470] Fix encoder in combine call [echauchot] [BEAM-8470] Revert extractKey while combinePerKey is not done (so that [echauchot] [BEAM-8470] Apply a groupByKey avoids for some reason that the spark [echauchot] [BEAM-8470] Fix case when a window does not merge into any other window [echauchot] [BEAM-8470] Fix wrong encoder in combineGlobally GBK [echauchot] [BEAM-8470] Fix bug in the window merging logic [echauchot] [BEAM-8470] Remove the mapPartition that adds a key per partition [echauchot] [BEAM-8470] Remove CombineGlobally translation because it is less [echauchot] [BEAM-8470] Now that there is only Combine.PerKey translation, make only [echauchot] [BEAM-8470] Clean no more needed KVHelpers [echauchot] [BEAM-8470] Clean not more needed RowHelpers [echauchot] [BEAM-8470] Clean not more needed WindowingHelpers [echauchot] [BEAM-8470] Fix javadoc of AggregatorCombiner [echauchot] [BEAM-8470] Fixed immutable list bug [echauchot] [BEAM-8470] add comment in combine globally test [echauchot] [BEAM-8470] Clean groupByKeyTest [echauchot] [BEAM-8470] Add a test that combine per key preserves windowing [echauchot] [BEAM-8470] Ignore for now not working test testCombineGlobally [echauchot] [BEAM-8470] Add metrics support in DoFn [echauchot] [BEAM-8470] Add missing dependencies to run Spark Structured Streaming [echauchot] [BEAM-8470] Add setEnableSparkMetricSinks() method [echauchot] [BEAM-8470] Fix javadoc [echauchot] [BEAM-8470] Fix accumulators initialization in Combine that prevented [echauchot] [BEAM-8470] Add a test to check that CombineGlobally preserves windowing [echauchot] [BEAM-8470] Persist all output Dataset if there are multiple outputs in [echauchot] [BEAM-8470] Added metrics sinks and tests [echauchot] [BEAM-8470] Make spotless happy [echauchot] [BEAM-8470] Add PipelineResults to Spark structured streaming. [echauchot] [BEAM-8470] Update log4j configuration [echauchot] [BEAM-8470] Add spark execution plans extended debug messages. [echauchot] [BEAM-8470] Print number of leaf datasets [echauchot] [BEAM-8470] fixup! Add PipelineResults to Spark structured streaming. [echauchot] [BEAM-8470] Remove no more needed AggregatorCombinerPerKey (there is [echauchot] [BEAM-8470] After testing performance and correctness, launch pipeline [echauchot] [BEAM-8470] Improve Pardo translation performance: avoid calling a [echauchot] [BEAM-8470] Use "sparkMaster" in local mode to obtain number of shuffle [echauchot] [BEAM-8470] Wrap Beam Coders into Spark Encoders using [echauchot] [BEAM-8470] Wrap Beam Coders into Spark Encoders using [echauchot] [BEAM-8470] type erasure: spark encoders require a Class<T>, pass Object [echauchot] [BEAM-8470] Fix scala Product in Encoders to avoid StackEverflow [echauchot] [BEAM-8470] Conform to spark ExpressionEncoders: pass classTags, [echauchot] [BEAM-8470] Add a simple spark native test to test Beam coders wrapping [echauchot] [BEAM-8470] Fix code generation in Beam coder wrapper [echauchot] [BEAM-8470] Lazy init coder because coder instance cannot be [echauchot] [BEAM-8470] Fix warning in coder construction by reflexion [echauchot] [BEAM-8470] Fix ExpressionEncoder generated code: typos, try catch, fqcn [echauchot] [BEAM-8470] Fix getting the output value in code generation [echauchot] [BEAM-8470] Fix beam coder lazy init using reflexion: use .clas + try [echauchot] [BEAM-8470] Remove lazy init of beam coder because there is no generic [echauchot] [BEAM-8470] Remove example code [echauchot] [BEAM-8470] Fix equal and hashcode [echauchot] [BEAM-8470] Fix generated code: uniform exceptions catching, fix [echauchot] [BEAM-8470] Add an assert of equality in the encoders test [echauchot] [BEAM-8470] Apply spotless and checkstyle and add javadocs [echauchot] [BEAM-8470] Wrap exceptions in UserCoderExceptions [echauchot] [BEAM-8470] Put Encoders expressions serializable [echauchot] [BEAM-8470] Catch Exception instead of IOException because some coders [echauchot] [BEAM-8470] Apply new Encoders to CombinePerKey [echauchot] [BEAM-8470] Apply new Encoders to Read source [echauchot] [BEAM-8470] Improve performance of source: the mapper already calls [echauchot] [BEAM-8470] Ignore long time failing test: SparkMetricsSinkTest [echauchot] [BEAM-8470] Apply new Encoders to Window assign translation [echauchot] [BEAM-8470] Apply new Encoders to AggregatorCombiner [echauchot] [BEAM-8470] Create a Tuple2Coder to encode scala tuple2 [echauchot] [BEAM-8470] Apply new Encoders to GroupByKey [echauchot] [BEAM-8470] Apply new Encoders to Pardo. Replace Tuple2Coder with [echauchot] [BEAM-8470] Apply spotless, fix typo and javadoc [echauchot] [BEAM-8470] Use beam encoders also in the output of the source [echauchot] [BEAM-8470] Remove unneeded cast [echauchot] [BEAM-8470] Fix: Remove generic hack of using object. Use actual Coder [echauchot] [BEAM-8470] Remove Encoders based on kryo now that we call Beam coders [echauchot] [BEAM-8470] Add a jenkins job for validates runner tests in the new [echauchot] [BEAM-8470] Apply spotless [echauchot] [BEAM-8470] Rebase on master: pass sideInputMapping in SimpleDoFnRunner [echauchot] Fix SpotBugs [echauchot] [BEAM-8470] simplify coders in combinePerKey translation [echauchot] [BEAM-8470] Fix combiner. Do not reuse instance of accumulator [echauchot] [BEAM-8470] input windows can arrive exploded (for sliding windows). As [echauchot] [BEAM-8470] Add a combine test with sliding windows [echauchot] [BEAM-8470] Add a test to test combine translation on binaryCombineFn [echauchot] [BEAM-8470] Fix tests: use correct [echauchot] [BEAM-8470] Fix wrong expected results in [echauchot] [BEAM-8470] Add disclaimers about this runner being experimental [echauchot] [BEAM-8470] Fix: create an empty accumulator in [echauchot] [BEAM-8470] Apply spotless [echauchot] [BEAM-8470] Add a countPerElement test with sliding windows [echauchot] [BEAM-8470] Fix the output timestamps of combine: timestamps must be [echauchot] [BEAM-8470] set log level to info to avoid resource consumption in [echauchot] [BEAM-8470] Fix CombineTest.testCountPerElementWithSlidingWindows [aromanenko.dev] [BEAM-8470] Remove "validatesStructuredStreamingRunnerBatch" from [echauchot] [BEAM-8470] Fix timestamps in combine output: assign the timestamp to ------------------------------------------ [...truncated 1.32 MB...] 19/11/20 14:38:35 INFO sdk_worker.create_state_handler: State channel established. 19/11/20 14:38:35 INFO data_plane.create_data_channel: Creating client data channel for localhost:46283 19/11/20 14:38:35 INFO org.apache.beam.runners.fnexecution.data.GrpcDataService: Beam Fn Data client connected. 19/11/20 14:38:35 INFO org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory: Closing environment urn: "beam:env:process:v1" payload: "\032\202\001<https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/ws/src/sdks/python/test-suites/portable/py2/build/sdk_worker.sh"> 19/11/20 14:38:35 INFO sdk_worker.run: No more requests from control plane 19/11/20 14:38:35 INFO sdk_worker.run: SDK Harness waiting for in-flight requests to complete 19/11/20 14:38:35 WARN org.apache.beam.sdk.fn.data.BeamFnDataGrpcMultiplexer: Hanged up for unknown endpoint. 19/11/20 14:38:35 INFO data_plane.close: Closing all cached grpc data channels. 19/11/20 14:38:35 INFO sdk_worker.close: Closing all cached gRPC state handlers. 19/11/20 14:38:35 INFO sdk_worker.run: Done consuming work. 19/11/20 14:38:35 INFO sdk_worker_main.main: Python sdk harness exiting. 19/11/20 14:38:35 INFO org.apache.beam.runners.fnexecution.logging.GrpcLoggingService: Logging client hanged up. 19/11/20 14:38:35 WARN org.apache.beam.sdk.fn.data.BeamFnDataGrpcMultiplexer: Hanged up for unknown endpoint. 19/11/20 14:38:35 INFO org.apache.beam.runners.fnexecution.artifact.AbstractArtifactRetrievalService: GetManifest for /tmp/sparktestM8epv3/job_a170b361-0556-4262-8ba2-aeaec0641ddd/MANIFEST 19/11/20 14:38:35 INFO org.apache.beam.runners.fnexecution.artifact.AbstractArtifactRetrievalService: GetManifest for /tmp/sparktestM8epv3/job_a170b361-0556-4262-8ba2-aeaec0641ddd/MANIFEST -> 0 artifacts 19/11/20 14:38:36 INFO org.apache.beam.runners.fnexecution.logging.GrpcLoggingService: Beam Fn Logging client connected. 19/11/20 14:38:36 INFO sdk_worker_main.main: Logging handler created. 19/11/20 14:38:36 INFO sdk_worker_main.start: Status HTTP server running at localhost:34981 19/11/20 14:38:36 INFO sdk_worker_main.main: semi_persistent_directory: /tmp 19/11/20 14:38:36 WARN sdk_worker_main._load_main_session: No session file found: /tmp/staged/pickled_main_session. Functions defined in __main__ (interactive session) may fail. 19/11/20 14:38:36 WARN pipeline_options.get_all_options: Discarding unparseable args: [u'--job_server_timeout=60', u'--app_name=test_windowing_1574260714.18_ce6f6c60-4aee-43c2-a2d6-5dd7ae57b385', u'--direct_runner_use_stacked_bundle', u'--spark_master=local', u'--options_id=30', u'--enable_spark_metric_sinks', u'--pipeline_type_check'] 19/11/20 14:38:36 INFO sdk_worker_main.main: Python sdk harness started with pipeline_options: {'runner': u'None', 'experiments': [u'beam_fn_api'], 'environment_cache_millis': u'0', 'environment_type': u'PROCESS', 'sdk_location': u'container', 'job_name': u'test_windowing_1574260714.18', 'environment_config': u'{"command": "<https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/ws/src/sdks/python/test-suites/portable/py2/build/sdk_worker.sh"}',> 'sdk_worker_parallelism': u'1', 'job_endpoint': u'localhost:33243'} 19/11/20 14:38:36 INFO statecache.__init__: Creating state cache with size 0 19/11/20 14:38:36 INFO sdk_worker.__init__: Creating insecure control channel for localhost:36079. 19/11/20 14:38:36 INFO org.apache.beam.runners.fnexecution.control.FnApiControlClientPoolService: Beam Fn Control client connected with id 262-1 19/11/20 14:38:36 INFO sdk_worker.__init__: Control channel established. 19/11/20 14:38:36 INFO sdk_worker.__init__: Initializing SDKHarness with unbounded number of workers. 19/11/20 14:38:36 INFO sdk_worker.create_state_handler: Creating insecure state channel for localhost:34975. 19/11/20 14:38:36 INFO sdk_worker.create_state_handler: State channel established. 19/11/20 14:38:36 INFO data_plane.create_data_channel: Creating client data channel for localhost:40539 19/11/20 14:38:36 INFO org.apache.beam.runners.fnexecution.data.GrpcDataService: Beam Fn Data client connected. 19/11/20 14:38:36 INFO org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory: Closing environment urn: "beam:env:process:v1" payload: "\032\202\001<https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/ws/src/sdks/python/test-suites/portable/py2/build/sdk_worker.sh"> 19/11/20 14:38:36 INFO sdk_worker.run: No more requests from control plane 19/11/20 14:38:36 INFO sdk_worker.run: SDK Harness waiting for in-flight requests to complete 19/11/20 14:38:36 WARN org.apache.beam.sdk.fn.data.BeamFnDataGrpcMultiplexer: Hanged up for unknown endpoint. 19/11/20 14:38:36 INFO data_plane.close: Closing all cached grpc data channels. 19/11/20 14:38:36 INFO sdk_worker.close: Closing all cached gRPC state handlers. 19/11/20 14:38:36 INFO sdk_worker.run: Done consuming work. 19/11/20 14:38:36 INFO sdk_worker_main.main: Python sdk harness exiting. 19/11/20 14:38:36 INFO org.apache.beam.runners.fnexecution.logging.GrpcLoggingService: Logging client hanged up. 19/11/20 14:38:36 WARN org.apache.beam.sdk.fn.data.BeamFnDataGrpcMultiplexer: Hanged up for unknown endpoint. 19/11/20 14:38:36 INFO org.apache.beam.runners.fnexecution.artifact.AbstractArtifactRetrievalService: GetManifest for /tmp/sparktestM8epv3/job_a170b361-0556-4262-8ba2-aeaec0641ddd/MANIFEST 19/11/20 14:38:36 INFO org.apache.beam.runners.fnexecution.artifact.AbstractArtifactRetrievalService: GetManifest for /tmp/sparktestM8epv3/job_a170b361-0556-4262-8ba2-aeaec0641ddd/MANIFEST -> 0 artifacts 19/11/20 14:38:37 INFO org.apache.beam.runners.fnexecution.logging.GrpcLoggingService: Beam Fn Logging client connected. 19/11/20 14:38:37 INFO sdk_worker_main.main: Logging handler created. 19/11/20 14:38:37 INFO sdk_worker_main.start: Status HTTP server running at localhost:46867 19/11/20 14:38:37 INFO sdk_worker_main.main: semi_persistent_directory: /tmp 19/11/20 14:38:37 WARN sdk_worker_main._load_main_session: No session file found: /tmp/staged/pickled_main_session. Functions defined in __main__ (interactive session) may fail. 19/11/20 14:38:37 WARN pipeline_options.get_all_options: Discarding unparseable args: [u'--job_server_timeout=60', u'--app_name=test_windowing_1574260714.18_ce6f6c60-4aee-43c2-a2d6-5dd7ae57b385', u'--direct_runner_use_stacked_bundle', u'--spark_master=local', u'--options_id=30', u'--enable_spark_metric_sinks', u'--pipeline_type_check'] 19/11/20 14:38:37 INFO sdk_worker_main.main: Python sdk harness started with pipeline_options: {'runner': u'None', 'experiments': [u'beam_fn_api'], 'environment_cache_millis': u'0', 'environment_type': u'PROCESS', 'sdk_location': u'container', 'job_name': u'test_windowing_1574260714.18', 'environment_config': u'{"command": "<https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/ws/src/sdks/python/test-suites/portable/py2/build/sdk_worker.sh"}',> 'sdk_worker_parallelism': u'1', 'job_endpoint': u'localhost:33243'} 19/11/20 14:38:37 INFO statecache.__init__: Creating state cache with size 0 19/11/20 14:38:37 INFO sdk_worker.__init__: Creating insecure control channel for localhost:44281. 19/11/20 14:38:37 INFO sdk_worker.__init__: Control channel established. 19/11/20 14:38:37 INFO sdk_worker.__init__: Initializing SDKHarness with unbounded number of workers. 19/11/20 14:38:37 INFO org.apache.beam.runners.fnexecution.control.FnApiControlClientPoolService: Beam Fn Control client connected with id 263-1 19/11/20 14:38:37 INFO sdk_worker.create_state_handler: Creating insecure state channel for localhost:44181. 19/11/20 14:38:37 INFO sdk_worker.create_state_handler: State channel established. 19/11/20 14:38:37 INFO data_plane.create_data_channel: Creating client data channel for localhost:40627 19/11/20 14:38:37 INFO org.apache.beam.runners.fnexecution.data.GrpcDataService: Beam Fn Data client connected. 19/11/20 14:38:37 INFO org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory: Closing environment urn: "beam:env:process:v1" payload: "\032\202\001<https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/ws/src/sdks/python/test-suites/portable/py2/build/sdk_worker.sh"> 19/11/20 14:38:37 INFO sdk_worker.run: No more requests from control plane 19/11/20 14:38:37 INFO sdk_worker.run: SDK Harness waiting for in-flight requests to complete 19/11/20 14:38:37 WARN org.apache.beam.sdk.fn.data.BeamFnDataGrpcMultiplexer: Hanged up for unknown endpoint. 19/11/20 14:38:37 INFO data_plane.close: Closing all cached grpc data channels. 19/11/20 14:38:37 INFO sdk_worker.close: Closing all cached gRPC state handlers. 19/11/20 14:38:37 INFO sdk_worker.run: Done consuming work. 19/11/20 14:38:37 INFO sdk_worker_main.main: Python sdk harness exiting. 19/11/20 14:38:37 INFO org.apache.beam.runners.fnexecution.logging.GrpcLoggingService: Logging client hanged up. 19/11/20 14:38:37 WARN org.apache.beam.sdk.fn.data.BeamFnDataGrpcMultiplexer: Hanged up for unknown endpoint. 19/11/20 14:38:37 INFO org.apache.beam.runners.fnexecution.artifact.AbstractArtifactRetrievalService: GetManifest for /tmp/sparktestM8epv3/job_a170b361-0556-4262-8ba2-aeaec0641ddd/MANIFEST 19/11/20 14:38:37 INFO org.apache.beam.runners.fnexecution.artifact.AbstractArtifactRetrievalService: GetManifest for /tmp/sparktestM8epv3/job_a170b361-0556-4262-8ba2-aeaec0641ddd/MANIFEST -> 0 artifacts 19/11/20 14:38:38 INFO org.apache.beam.runners.fnexecution.logging.GrpcLoggingService: Beam Fn Logging client connected. 19/11/20 14:38:38 INFO sdk_worker_main.main: Logging handler created. 19/11/20 14:38:38 INFO sdk_worker_main.start: Status HTTP server running at localhost:44003 19/11/20 14:38:38 INFO sdk_worker_main.main: semi_persistent_directory: /tmp 19/11/20 14:38:38 WARN sdk_worker_main._load_main_session: No session file found: /tmp/staged/pickled_main_session. Functions defined in __main__ (interactive session) may fail. 19/11/20 14:38:38 WARN pipeline_options.get_all_options: Discarding unparseable args: [u'--job_server_timeout=60', u'--app_name=test_windowing_1574260714.18_ce6f6c60-4aee-43c2-a2d6-5dd7ae57b385', u'--direct_runner_use_stacked_bundle', u'--spark_master=local', u'--options_id=30', u'--enable_spark_metric_sinks', u'--pipeline_type_check'] 19/11/20 14:38:38 INFO sdk_worker_main.main: Python sdk harness started with pipeline_options: {'runner': u'None', 'experiments': [u'beam_fn_api'], 'environment_cache_millis': u'0', 'environment_type': u'PROCESS', 'sdk_location': u'container', 'job_name': u'test_windowing_1574260714.18', 'environment_config': u'{"command": "<https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/ws/src/sdks/python/test-suites/portable/py2/build/sdk_worker.sh"}',> 'sdk_worker_parallelism': u'1', 'job_endpoint': u'localhost:33243'} 19/11/20 14:38:38 INFO statecache.__init__: Creating state cache with size 0 19/11/20 14:38:38 INFO sdk_worker.__init__: Creating insecure control channel for localhost:37061. 19/11/20 14:38:38 INFO sdk_worker.__init__: Control channel established. 19/11/20 14:38:38 INFO sdk_worker.__init__: Initializing SDKHarness with unbounded number of workers. 19/11/20 14:38:38 INFO org.apache.beam.runners.fnexecution.control.FnApiControlClientPoolService: Beam Fn Control client connected with id 264-1 19/11/20 14:38:38 INFO sdk_worker.create_state_handler: Creating insecure state channel for localhost:40607. 19/11/20 14:38:38 INFO sdk_worker.create_state_handler: State channel established. 19/11/20 14:38:38 INFO data_plane.create_data_channel: Creating client data channel for localhost:36721 19/11/20 14:38:38 INFO org.apache.beam.runners.fnexecution.data.GrpcDataService: Beam Fn Data client connected. 19/11/20 14:38:38 INFO org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory: Closing environment urn: "beam:env:process:v1" payload: "\032\202\001<https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/ws/src/sdks/python/test-suites/portable/py2/build/sdk_worker.sh"> 19/11/20 14:38:38 INFO sdk_worker.run: No more requests from control plane 19/11/20 14:38:38 INFO sdk_worker.run: SDK Harness waiting for in-flight requests to complete 19/11/20 14:38:38 INFO data_plane.close: Closing all cached grpc data channels. 19/11/20 14:38:38 WARN org.apache.beam.sdk.fn.data.BeamFnDataGrpcMultiplexer: Hanged up for unknown endpoint. 19/11/20 14:38:38 INFO sdk_worker.close: Closing all cached gRPC state handlers. 19/11/20 14:38:38 INFO sdk_worker.run: Done consuming work. 19/11/20 14:38:38 INFO sdk_worker_main.main: Python sdk harness exiting. 19/11/20 14:38:38 INFO org.apache.beam.runners.fnexecution.logging.GrpcLoggingService: Logging client hanged up. 19/11/20 14:38:38 WARN org.apache.beam.sdk.fn.data.BeamFnDataGrpcMultiplexer: Hanged up for unknown endpoint. 19/11/20 14:38:38 INFO org.apache.beam.runners.fnexecution.artifact.AbstractArtifactRetrievalService: GetManifest for /tmp/sparktestM8epv3/job_a170b361-0556-4262-8ba2-aeaec0641ddd/MANIFEST 19/11/20 14:38:38 INFO org.apache.beam.runners.fnexecution.artifact.AbstractArtifactRetrievalService: GetManifest for /tmp/sparktestM8epv3/job_a170b361-0556-4262-8ba2-aeaec0641ddd/MANIFEST -> 0 artifacts 19/11/20 14:38:39 INFO org.apache.beam.runners.fnexecution.logging.GrpcLoggingService: Beam Fn Logging client connected. 19/11/20 14:38:39 INFO sdk_worker_main.main: Logging handler created. 19/11/20 14:38:39 INFO sdk_worker_main.start: Status HTTP server running at localhost:44923 19/11/20 14:38:39 INFO sdk_worker_main.main: semi_persistent_directory: /tmp 19/11/20 14:38:39 WARN sdk_worker_main._load_main_session: No session file found: /tmp/staged/pickled_main_session. Functions defined in __main__ (interactive session) may fail. 19/11/20 14:38:39 WARN pipeline_options.get_all_options: Discarding unparseable args: [u'--job_server_timeout=60', u'--app_name=test_windowing_1574260714.18_ce6f6c60-4aee-43c2-a2d6-5dd7ae57b385', u'--direct_runner_use_stacked_bundle', u'--spark_master=local', u'--options_id=30', u'--enable_spark_metric_sinks', u'--pipeline_type_check'] 19/11/20 14:38:39 INFO sdk_worker_main.main: Python sdk harness started with pipeline_options: {'runner': u'None', 'experiments': [u'beam_fn_api'], 'environment_cache_millis': u'0', 'environment_type': u'PROCESS', 'sdk_location': u'container', 'job_name': u'test_windowing_1574260714.18', 'environment_config': u'{"command": "<https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/ws/src/sdks/python/test-suites/portable/py2/build/sdk_worker.sh"}',> 'sdk_worker_parallelism': u'1', 'job_endpoint': u'localhost:33243'} 19/11/20 14:38:39 INFO statecache.__init__: Creating state cache with size 0 19/11/20 14:38:39 INFO sdk_worker.__init__: Creating insecure control channel for localhost:38011. 19/11/20 14:38:39 INFO sdk_worker.__init__: Control channel established. 19/11/20 14:38:39 INFO org.apache.beam.runners.fnexecution.control.FnApiControlClientPoolService: Beam Fn Control client connected with id 265-1 19/11/20 14:38:39 INFO sdk_worker.__init__: Initializing SDKHarness with unbounded number of workers. 19/11/20 14:38:39 INFO sdk_worker.create_state_handler: Creating insecure state channel for localhost:41833. 19/11/20 14:38:39 INFO sdk_worker.create_state_handler: State channel established. 19/11/20 14:38:39 INFO data_plane.create_data_channel: Creating client data channel for localhost:37195 19/11/20 14:38:39 INFO org.apache.beam.runners.fnexecution.data.GrpcDataService: Beam Fn Data client connected. 19/11/20 14:38:39 INFO org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory: Closing environment urn: "beam:env:process:v1" payload: "\032\202\001<https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/ws/src/sdks/python/test-suites/portable/py2/build/sdk_worker.sh"> 19/11/20 14:38:39 INFO sdk_worker.run: No more requests from control plane 19/11/20 14:38:39 INFO sdk_worker.run: SDK Harness waiting for in-flight requests to complete 19/11/20 14:38:39 INFO data_plane.close: Closing all cached grpc data channels. 19/11/20 14:38:39 WARN org.apache.beam.sdk.fn.data.BeamFnDataGrpcMultiplexer: Hanged up for unknown endpoint. 19/11/20 14:38:39 INFO sdk_worker.close: Closing all cached gRPC state handlers. 19/11/20 14:38:39 INFO sdk_worker.run: Done consuming work. 19/11/20 14:38:39 INFO sdk_worker_main.main: Python sdk harness exiting. 19/11/20 14:38:39 INFO org.apache.beam.runners.fnexecution.logging.GrpcLoggingService: Logging client hanged up. 19/11/20 14:38:39 WARN org.apache.beam.sdk.fn.data.BeamFnDataGrpcMultiplexer: Hanged up for unknown endpoint. 19/11/20 14:38:39 INFO org.apache.beam.runners.spark.SparkPipelineRunner: Job test_windowing_1574260714.18_ce6f6c60-4aee-43c2-a2d6-5dd7ae57b385 finished. 19/11/20 14:38:39 WARN org.apache.beam.runners.spark.SparkPipelineResult$BatchMode: Collecting monitoring infos is not implemented yet in Spark portable runner. 19/11/20 14:38:39 INFO org.apache.beam.runners.fnexecution.artifact.AbstractArtifactRetrievalService: Manifest at /tmp/sparktestM8epv3/job_a170b361-0556-4262-8ba2-aeaec0641ddd/MANIFEST has 0 artifact locations 19/11/20 14:38:39 INFO org.apache.beam.runners.fnexecution.artifact.BeamFileSystemArtifactStagingService: Removed dir /tmp/sparktestM8epv3/job_a170b361-0556-4262-8ba2-aeaec0641ddd/ INFO:apache_beam.runners.portability.portable_runner:Job state changed to DONE . ====================================================================== ERROR: test_pardo_state_with_custom_key_coder (__main__.SparkRunnerTest) Tests that state requests work correctly when the key coder is an ---------------------------------------------------------------------- Traceback (most recent call last): File "apache_beam/runners/portability/portable_runner_test.py", line 231, in test_pardo_state_with_custom_key_coder equal_to(expected)) File "apache_beam/pipeline.py", line 436, in __exit__ self.run().wait_until_finish() File "apache_beam/runners/portability/portable_runner.py", line 428, in wait_until_finish for state_response in self._state_stream: File "<https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/ws/src/build/gradleenv/1866363813/local/lib/python2.7/site-packages/grpc/_channel.py",> line 395, in next return self._next() File "<https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/ws/src/build/gradleenv/1866363813/local/lib/python2.7/site-packages/grpc/_channel.py",> line 552, in _next _common.wait(self._state.condition.wait, _response_ready) File "<https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/ws/src/build/gradleenv/1866363813/local/lib/python2.7/site-packages/grpc/_common.py",> line 140, in wait _wait_once(wait_fn, MAXIMUM_WAIT_TIMEOUT, spin_cb) File "<https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/ws/src/build/gradleenv/1866363813/local/lib/python2.7/site-packages/grpc/_common.py",> line 105, in _wait_once wait_fn(timeout=timeout) File "/usr/lib/python2.7/threading.py", line 359, in wait _sleep(delay) File "apache_beam/runners/portability/portable_runner_test.py", line 75, in handler ==================== Timed out after 60 seconds. ==================== # Thread: <Thread(wait_until_finish_read, started daemon 140314589488896)> raise BaseException(msg) BaseException: Timed out after 60 seconds. ====================================================================== ERROR: test_pardo_timers (__main__.SparkRunnerTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "apache_beam/runners/portability/fn_api_runner_test.py", line 328, in test_pardo_timers assert_that(actual, equal_to(expected)) File "apache_beam/pipeline.py", line 436, in __exit__ self.run().wait_until_finish() # Thread: <Thread(Thread-118, started daemon 140314597881600)> File "apache_beam/runners/portability/portable_ru# Thread: <_MainThread(MainThread, started 140315379283712)> ==================== Timed out after 60 seconds. ==================== nner.py", line 428, in wait_until_finish for state_response in self._state_stream: # Thread: <Thread(wait_until_finish_read, started daemon 140314098530048)> File "<https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/ws/src/build/gradleenv/1866363813/local/lib/python2.7/site-packages/grpc/_channel.py",> line 395, in next return self._next() # Thread: <Thread(Thread-124, started daemon 140314090137344)> File "<https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/ws/src/build/gradleenv/1866363813/local/lib/python2.7/site-packages/grpc/_channel.py",> line 552, in _next _common.wait(self._state.condition.wait, _response_ready) # Thread: <_MainThread(MainThread, started 140315379283712)> File "<https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/ws/src/build/gradleenv/1866363813/local/lib/python2.7/site-packages/grpc/_common.py",> line 140, in wait _wait_once(wait_fn, MAXIMUM_WAIT_TIMEOUT, spin_cb) File "<https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/ws/src/build/gradleenv/1866363813/local/lib/python2.7/site-packages/grpc/_common.py",> line 105, in _wait_once wait_fn(timeout=timeout) File "/usr/lib/python2.7/threading.py", line 359, in wait _sleep(delay) File "apache_beam/runners/portability/portable_runner_test.py", line 75, in handler raise BaseException(msg) BaseException: Timed out after 60 seconds. ====================================================================== ERROR: test_sdf_with_watermark_tracking (__main__.SparkRunnerTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "apache_beam/runners/portability/fn_api_runner_test.py", line 499, in test_sdf_with_watermark_tracking assert_that(actual, equal_to(list(''.join(data)))) File "apache_beam/pipeline.py", line 436, in __exit__ self.run().wait_until_finish() File "apache_beam/runners/portability/portable_runner.py", line 438, in wait_until_finish self._job_id, self._state, self._last_error_message())) RuntimeError: Pipeline test_sdf_with_watermark_tracking_1574260705.52_39edadb2-ae7d-49b9-afe2-79d8bc21cf60 failed in state FAILED: java.lang.UnsupportedOperationException: The ActiveBundle does not have a registered bundle checkpoint handler. ---------------------------------------------------------------------- Ran 38 tests in 326.382s FAILED (errors=3, skipped=9) > Task :sdks:python:test-suites:portable:py2:sparkValidatesRunner FAILED FAILURE: Build failed with an exception. * Where: Build file '<https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/ws/src/sdks/python/test-suites/portable/py2/build.gradle'> line: 198 * What went wrong: Execution failed for task ':sdks:python:test-suites:portable:py2:sparkValidatesRunner'. > Process 'command 'sh'' finished with non-zero exit value 1 * Try: Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights. * Get more help at https://help.gradle.org Deprecated Gradle features were used in this build, making it incompatible with Gradle 6.0. Use '--warning-mode all' to show the individual deprecation warnings. See https://docs.gradle.org/5.2.1/userguide/command_line_interface.html#sec:command_line_warnings BUILD FAILED in 8m 29s 60 actionable tasks: 52 executed, 8 from cache Publishing build scan... https://gradle.com/s/admroodybd366 Build step 'Invoke Gradle script' changed build result to FAILURE Build step 'Invoke Gradle script' marked build as failure --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
