This is an automated email from the ASF dual-hosted git repository. aromanenko pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/beam.git.
from 0b415fd Merge pull request #10160 More compartmentalization of bundle-based-runner only utilities. add 6f72c93 [BEAM-8470] Add an empty spark-structured-streaming runner project targeting spark 2.4.0 add 4ca7e55 [BEAM-8470] Fix missing dep add 0fc0d0a4 [BEAM-8470] Add SparkPipelineOptions add e8ca23e [BEAM-8470] Start pipeline translation add 00964d2 [BEAM-8470] Add global pipeline translation structure add a3b278e [BEAM-8470] Add nodes translators structure add ef4941a [BEAM-8470] Wire node translators with pipeline translator add 8a8dc1e [BEAM-8470] Renames: better differenciate pipeline translator for transform translator add cdfd589 [BEAM-8470] Organise methods in PipelineTranslator add 38eca95 [BEAM-8470] Initialise BatchTranslationContext add 80f2d8c [BEAM-8470] Refactoring: -move batch/streaming common translation visitor and utility methods to PipelineTranslator -rename batch dedicated classes to Batch* to differentiate with their streaming counterparts -Introduce TranslationContext for common batch/streaming components add baf210f [BEAM-8470] Make transform translation clearer: renaming, comments add b65a9da [BEAM-8470] Improve javadocs add 0434749 [BEAM-8470] Move SparkTransformOverrides to correct package add 4372c7e [BEAM-8470] Move common translation context components to superclass add 49b666b [BEAM-8470] apply spotless add 0d6906a [BEAM-8470] Make codestyle and firebug happy add ef97440 [BEAM-8470] Add TODOs add 9abf8ac [BEAM-8470] Post-pone batch qualifier in all classes names for readability add 11a6e19 [BEAM-8470] Add precise TODO for multiple TransformTranslator per transform URN add 47ed3d1 [BEAM-8470] Added SparkRunnerRegistrar add 022a0d0 [BEAM-8470] Add basic pipeline execution. Refactor translatePipeline() to return the translationContext on which we can run startPipeline() add 96b3f36 [BEAM-8470] Create PCollections manipulation methods add b0c42af [BEAM-8470] Create Datasets manipulation methods add 2c5cb23 [BEAM-8470] Add Flatten transformation translator add 9f1bf60 [BEAM-8470] Add primitive GroupByKeyTranslatorBatch implementation add 98ea9fb [BEAM-8470] Use Iterators.transform() to return Iterable add 0b55323 [BEAM-8470] Implement read transform add 4c91a57 [BEAM-8470] update TODO add 4adf3bb [BEAM-8470] Apply spotless add 6b4b916 [BEAM-8470] start source instanciation add ff60578 [BEAM-8470] Improve exception flow add 2ee98da [BEAM-8470] Improve type enforcement in ReadSourceTranslator add fc3abf5 [BEAM-8470] Experiment over using spark Catalog to pass in Beam Source through spark Table add 4746d9b [BEAM-8470] Add source mocks add e45e48d [BEAM-8470] fix mock, wire mock in translators and create a main test. add 7d7fe77 [BEAM-8470] Use raw WindowedValue so that spark Encoders could work (temporary) add 9d84a0f [BEAM-8470] clean deps add b4032aa [BEAM-8470] Move DatasetSourceMock to proper batch mode add a7ad1ab [BEAM-8470] Run pipeline in batch mode or in streaming mode add 141e4bc [BEAM-8470] Split batch and streaming sources and translators add 8954c50 [BEAM-8470] Use raw Encoder<WindowedValue> also in regular ReadSourceTranslatorBatch add 00ef268 [BEAM-8470] Clean add c426c98 [BEAM-8470] Add ReadSourceTranslatorStreaming add 1740dc4 [BEAM-8470] Move Source and translator mocks to a mock package. add 7819918 [BEAM-8470] Pass Beam Source and PipelineOptions to the spark DataSource as serialized strings add b10aa53 [BEAM-8470] Refactor DatasetSource fields add c4bb08c [BEAM-8470] Wire real SourceTransform and not mock and update the test add 5bbea63 [BEAM-8470] Add missing 0-arg public constructor add 0dbe26f [BEAM-8470] Use new PipelineOptionsSerializationUtils add 43052d3 [BEAM-8470] Apply spotless and fix checkstyle add 17ca18b [BEAM-8470] Add a dummy schema for reader add 2e8393b [BEAM-8470] Add empty 0-arg constructor for mock source add 43ff919 [BEAM-8470] Clean add c8ad727 [BEAM-8470] Checkstyle and Findbugs add 6e3575d [BEAM-8470] Refactor SourceTest to a UTest instaed of a main add c221aaa [BEAM-8470] Fix pipeline triggering: use a spark action instead of writing the dataset add c26c421 [BEAM-8470] improve readability of options passing to the source add ff69ded [BEAM-8470] Clean unneeded fields in DatasetReader add 524667e [BEAM-8470] Fix serialization issues add 638bdae [BEAM-8470] Add SerializationDebugger add fd354fa [BEAM-8470] Add serialization test add 163102b [BEAM-8470] Move SourceTest to same package as tested class add 68fc6d5 [BEAM-8470] Fix SourceTest add e4c76fc [BEAM-8470] Simplify beam reader creation as it created once the source as already been partitioned add 90463a0 [BEAM-8470] Put all transform translators Serializable add 940b484 [BEAM-8470] Enable test mode add 0b156bf [BEAM-8470] Enable gradle build scan add 74080c1 [BEAM-8470] Add flatten test add 740131e [BEAM-8470] First attempt for ParDo primitive implementation add 0cedcd7 [BEAM-8470] Serialize windowedValue to byte[] in source to be able to specify a binary dataset schema and deserialize windowedValue from Row to get a dataset<WindowedValue> add 79b075a [BEAM-8470] Comment schema choices add 5d1b2b5 [BEAM-8470] Fix errorprone add 0529996 [BEAM-8470] Fix testMode output to comply with new binary schema add 696597c [BEAM-8470] Cleaning add 91964b9 [BEAM-8470] Remove bundleSize parameter and always use spark default parallelism add 35051e0 [BEAM-8470] Fix split bug add 6902b4b [BEAM-8470] Clean add e5a36f0 [BEAM-8470] Add ParDoTest add 057e0ac [BEAM-8470] Address minor review notes add b706a74 [BEAM-8470] Clean add 83779db [BEAM-8470] Add GroupByKeyTest add 0330f32 [BEAM-8470] Add comments and TODO to GroupByKeyTranslatorBatch add 1cb53e3 [BEAM-8470] Fix type checking with Encoder of WindowedValue<T> add e363586 [BEAM-8470] Port latest changes of ReadSourceTranslatorBatch to ReadSourceTranslatorStreaming add 9b9cca6 [BEAM-8470] Remove no more needed putDatasetRaw add cb1a085 [BEAM-8470] Add ComplexSourceTest add c13e8ae [BEAM-8470] Fail in case of having SideInouts or State/Timers add 83e4067 [BEAM-8470] Fix Encoders: create an Encoder for every manipulated type to avoid Spark fallback to genericRowWithSchema and cast to allow having Encoders with generic types such as WindowedValue<T> and get the type checking back add d19c8b9 [BEAM-8470] Apply spotless add 1051eba [BEAM-8470] Fixed Javadoc error add ee67f21 [BEAM-8470] Rename SparkSideInputReader class and rename pruneOutput() to pruneOutputFilteredByTag() add 8e0f183 [BEAM-8470] Don't use deprecated sideInput.getWindowingStrategyInternal() add f0ff5b4 [BEAM-8470] Simplify logic of ParDo translator add 9c4e36d [BEAM-8470] Fix kryo issue in GBK translator with a workaround add b3a740c [BEAM-8470] Rename SparkOutputManager for consistency add 438f1e9 [BEAM-8470] Fix for test elements container in GroupByKeyTest add be1cd23 [BEAM-8470] Added "testTwoPardoInRow" add ed6b12c [BEAM-8470] Add a test for the most simple possible Combine add 35f100c [BEAM-8470] Rename SparkDoFnFilterFunction to DoFnFilterFunction for consistency add 8111241 [BEAM-8470] Generalize the use of SerializablePipelineOptions in place of (not serializable) PipelineOptions add 2d0c33e [BEAM-8470] Fix getSideInputs add 8662ef8 [BEAM-8470] Extract binary schema creation in a helper class add 9a757d0 [BEAM-8470] First version of combinePerKey add 3da0276 [BEAM-8470] Improve type checking of Tuple2 encoder add ca39fe1 [BEAM-8470] Introduce WindowingHelpers (and helpers package) and use it in Pardo, GBK and CombinePerKey add f002039 [BEAM-8470] Fix combiner using KV as input, use binary encoders in place of accumulatorEncoder and outputEncoder, use helpers, spotless add cec975f [BEAM-8470] Add combinePerKey and CombineGlobally tests add 39a978a [BEAM-8470] Introduce RowHelpers add c3cd76f [BEAM-8470] Add CombineGlobally translation to avoid translating Combine.perKey as a composite transform based on Combine.PerKey (which uses low perf GBK) add 9ee65d5 [BEAM-8470] Cleaning add 279d2ca [BEAM-8470] Get back to classes in translators resolution because URNs cannot translate Combine.Globally add 55bf503 [BEAM-8470] Fix various type checking issues in Combine.Globally add d5dab93 [BEAM-8470] Update test with Long add fd1d8c5 [BEAM-8470] Fix combine. For unknown reason GenericRowWithSchema is used as input of combine so extract its content to be able to proceed add 5c074eb [BEAM-8470] Use more generic Row instead of GenericRowWithSchema add 341c39f [BEAM-8470] Add explanation about receiving a Row as input in the combiner add 89d38b1 [BEAM-8470] Fix encoder bug in combinePerkey add 6b109d3 [BEAM-8470] Cleaning add 9494886 [BEAM-8470] Implement WindowAssignTranslatorBatch add c83a607 [BEAM-8470] Implement WindowAssignTest add 9bf9385 [BEAM-8470] Fix javadoc add 8f5e9dc [BEAM-8470] Added SideInput support add 915ca01 [BEAM-8470] Fix CheckStyle violations add 92cab8e [BEAM-8470] Don't use Reshuffle translation add 6bbeeff [BEAM-8470] Added using CachedSideInputReader add f8512c7 [BEAM-8470] Added TODO comment for ReshuffleTranslatorBatch add 19ee565 [BEAM-8470] And unchecked warning suppression add 1bf9a4a [BEAM-8470] Add streaming source initialisation add 6ca706a [BEAM-8470] Implement first streaming source add b5ccec6 [BEAM-8470] Add a TODO on spark output modes add 175940d [BEAM-8470] Add transformators registry in PipelineTranslatorStreaming add ba59f10 [BEAM-8470] Add source streaming test add 80dde2d [BEAM-8470] Specify checkpointLocation at the pipeline start add e494189 [BEAM-8470] Clean unneeded 0 arg constructor in batch source add e92adee [BEAM-8470] Clean streaming source add 31a2be5 [BEAM-8470] Continue impl of offsets for streaming source add 8777607 [BEAM-8470] Deal with checkpoint and offset based read add 43088fb [BEAM-8470] Apply spotless and fix spotbugs warnings add 75e3f7f [BEAM-8470] Disable never ending test SimpleSourceTest.testUnboundedSource add 9e60f27 [BEAM-8470] Fix access level issues, typos and modernize code to Java 8 style add 0492206 [BEAM-8470] Merge Spark Structured Streaming runner into main Spark module. Remove spark-structured-streaming module.Rename Runner to SparkStructuredStreamingRunner. Remove specific PipelineOotions and Registrar for Structured Streaming Runner add 0e37bae [BEAM-8470] Fix non-vendored imports from Spark Streaming Runner classes add 01a2fd4 [BEAM-8470] Pass doFnSchemaInformation to ParDo batch translation add f950dbf [BEAM-8470] Fix spotless issues after rebase add 82cf713 [BEAM-8470] Fix logging levels in Spark Structured Streaming translation add f44c863 [BEAM-8470] Add SparkStructuredStreamingPipelineOptions and SparkCommonPipelineOptions - SparkStructuredStreamingPipelineOptions was added to have the new runner rely only on its specific options. add d0f2ba3 [BEAM-8470] Rename SparkPipelineResult to SparkStructuredStreamingPipelineResult This is done to avoid an eventual collision with the one in SparkRunner. However this cannot happen at this moment because it is package private, so it is also done for consistency. add 23e3c53 [BEAM-8470] Use PAssert in Spark Structured Streaming transform tests add 3ba3e51 [BEAM-8470] Ignore spark offsets (cf javadoc) add 05266b9 [BEAM-8470] implement source.stop add 75209f9 [BEAM-8470] Update javadoc add f11e382 [BEAM-8470] Apply Spotless add 075a79b [BEAM-8470] Enable batch Validates Runner tests for Structured Streaming Runner add de5b17b [BEAM-8470] Limit the number of partitions to make tests go 300% faster add 71edc3e [BEAM-8470] Fixes ParDo not calling setup and not tearing down if exception on startBundle add 5972c51 [BEAM-8470] Pass transform based doFnSchemaInformation in ParDo translation add 3dbcaba [BEAM-8470] Consider null object case on RowHelpers, fixes empty side inputs tests. add a687d5d [BEAM-8470] Put back batch/simpleSourceTest.testBoundedSource add b70f605 [BEAM-8470] Update windowAssignTest add e070a69 [BEAM-8470] Add comment about checkpoint mark add 2c1437c [BEAM-8470] Re-code GroupByKeyTranslatorBatch to conserve windowing instead of unwindowing/windowing(GlobalWindow): simplify code, use ReduceFnRunner to merge the windows add 01184ff [BEAM-8470] re-enable reduceFnRunner timers for output add dfed0a5 [BEAM-8470] Improve visibility of debug messages add c7dd248 [BEAM-8470] Add a test that GBK preserves windowing add 65e1b79 [BEAM-8470] Add TODO in Combine translations add 346f2f4 [BEAM-8470] Update KVHelpers.extractKey() to deal with WindowedValue and update GBK and CPK add c9f4525 [BEAM-8470] Fix comment about schemas add b99c9ef [BEAM-8470] Implement reduce part of CombineGlobally translation with windowing add 1b6339e [BEAM-8470] Output data after combine add 66f6021 [BEAM-8470] Implement merge accumulators part of CombineGlobally translation with windowing add 6168070 [BEAM-8470] Fix encoder in combine call add 0c00e2f [BEAM-8470] Revert extractKey while combinePerKey is not done (so that it compiles) add f9ecf1f [BEAM-8470] Apply a groupByKey avoids for some reason that the spark structured streaming fmwk casts data to Row which makes it impossible to deserialize without the coder shipped into the data. For performance reasons (avoid memory consumption and having to deserialize), we do not ship coder + data. Also add a mapparitions before GBK to avoid shuffling add abaf140 [BEAM-8470] Fix case when a window does not merge into any other window add 319e01f [BEAM-8470] Fix wrong encoder in combineGlobally GBK add a759ddc [BEAM-8470] Fix bug in the window merging logic add d609213 [BEAM-8470] Remove the mapPartition that adds a key per partition because otherwise spark will reduce values per key instead of globally add 2cbd35d [BEAM-8470] Remove CombineGlobally translation because it is less performant than the beam sdk one (key + combinePerKey.withHotkeyFanout) add 24c137a [BEAM-8470] Now that there is only Combine.PerKey translation, make only one Aggregator add b7e4a8b [BEAM-8470] Clean no more needed KVHelpers add 5215a5a [BEAM-8470] Clean not more needed RowHelpers add 4e06ffe [BEAM-8470] Clean not more needed WindowingHelpers add c83168c [BEAM-8470] Fix javadoc of AggregatorCombiner add 5898cf0 [BEAM-8470] Fixed immutable list bug add 37b5fa8 [BEAM-8470] add comment in combine globally test add 6eea481 [BEAM-8470] Clean groupByKeyTest add 3e33078 [BEAM-8470] Add a test that combine per key preserves windowing add 55d865b [BEAM-8470] Ignore for now not working test testCombineGlobally add 65ffdd8 [BEAM-8470] Add metrics support in DoFn add e2b57ac [BEAM-8470] Add missing dependencies to run Spark Structured Streaming Runner on Nexmark add 323d10f [BEAM-8470] Add setEnableSparkMetricSinks() method add 2f1d38b [BEAM-8470] Fix javadoc add 06559d2 [BEAM-8470] Fix accumulators initialization in Combine that prevented CombineGlobally to work. add 3cce68c [BEAM-8470] Add a test to check that CombineGlobally preserves windowing add 68b6f98 [BEAM-8470] Persist all output Dataset if there are multiple outputs in pipeline Enabled Use*Metrics tests add 6084764 [BEAM-8470] Added metrics sinks and tests add 69c514f [BEAM-8470] Make spotless happy add 4cab501 [BEAM-8470] Add PipelineResults to Spark structured streaming. add a1ec37a [BEAM-8470] Update log4j configuration add 2a53b32 [BEAM-8470] Add spark execution plans extended debug messages. add f34de05 [BEAM-8470] Print number of leaf datasets add 892b742 [BEAM-8470] fixup! Add PipelineResults to Spark structured streaming. add 613bac9 [BEAM-8470] Remove no more needed AggregatorCombinerPerKey (there is only AggregatorCombiner) add d024335 [BEAM-8470] After testing performance and correctness, launch pipeline with dataset.foreach(). Make both test mode and production mode use foreach for uniformity. Move dataset print as a utility method add 7a30c5a [BEAM-8470] Improve Pardo translation performance: avoid calling a filter transform when there is only one output tag add 0e42c18 [BEAM-8470] Use "sparkMaster" in local mode to obtain number of shuffle partitions + spotless apply add 7f08fc8 [BEAM-8470] Wrap Beam Coders into Spark Encoders using ExpressionEncoder: serialization part add 2071f1c [BEAM-8470] Wrap Beam Coders into Spark Encoders using ExpressionEncoder: deserialization part add 6f0e9fa [BEAM-8470] type erasure: spark encoders require a Class<T>, pass Object and cast to Class<T> add 965814a [BEAM-8470] Fix scala Product in Encoders to avoid StackEverflow add 165d8b7 [BEAM-8470] Conform to spark ExpressionEncoders: pass classTags, implement scala Product, pass children from within the ExpressionEncoder, fix visibilities add 609aace [BEAM-8470] Add a simple spark native test to test Beam coders wrapping into Spark Encoders add 09e882e [BEAM-8470] Fix code generation in Beam coder wrapper add d715e37 [BEAM-8470] Lazy init coder because coder instance cannot be interpolated by catalyst add d27962c [BEAM-8470] Fix warning in coder construction by reflexion add fe46a63 [BEAM-8470] Fix ExpressionEncoder generated code: typos, try catch, fqcn add 64a978d [BEAM-8470] Fix getting the output value in code generation add cfecb40 [BEAM-8470] Fix beam coder lazy init using reflexion: use .clas + try catch + cast add 0d13b77 [BEAM-8470] Remove lazy init of beam coder because there is no generic way on instanciating a beam coder add 1a6e662 [BEAM-8470] Remove example code add e45e7b5 [BEAM-8470] Fix equal and hashcode add dd63c1b [BEAM-8470] Fix generated code: uniform exceptions catching, fix parenthesis and variable declarations add 75a8b24 [BEAM-8470] Add an assert of equality in the encoders test add ff50364 [BEAM-8470] Apply spotless and checkstyle and add javadocs add 4e215e0 [BEAM-8470] Wrap exceptions in UserCoderExceptions add 8ed3db6 [BEAM-8470] Put Encoders expressions serializable add 465d570 [BEAM-8470] Catch Exception instead of IOException because some coders to not throw Exceptions at all (e.g.VoidCoder) add 4579663 [BEAM-8470] Apply new Encoders to CombinePerKey add d246520 [BEAM-8470] Apply new Encoders to Read source add a448cfb [BEAM-8470] Improve performance of source: the mapper already calls windowedValueCoder.decode, no need to call it also in the Spark encoder add c875dbc [BEAM-8470] Ignore long time failing test: SparkMetricsSinkTest add be7415d [BEAM-8470] Apply new Encoders to Window assign translation add cdd6e1a [BEAM-8470] Apply new Encoders to AggregatorCombiner add 9ae6a57 [BEAM-8470] Create a Tuple2Coder to encode scala tuple2 add f827291 [BEAM-8470] Apply new Encoders to GroupByKey add 76125e9 [BEAM-8470] Apply new Encoders to Pardo. Replace Tuple2Coder with MultiOutputCoder to deal with multiple output to use in Spark Encoder for DoFnRunner add f492ccf [BEAM-8470] Apply spotless, fix typo and javadoc add d81d2f6 [BEAM-8470] Use beam encoders also in the output of the source translation add 9599546 [BEAM-8470] Remove unneeded cast add 0e072d3 [BEAM-8470] Fix: Remove generic hack of using object. Use actual Coder encodedType in Encoders add da8dfbd [BEAM-8470] Remove Encoders based on kryo now that we call Beam coders in the runner add 470d04f [BEAM-8470] Add a jenkins job for validates runner tests in the new spark runner add 17ace84 [BEAM-8470] Apply spotless add 2ca30a4 [BEAM-8470] Rebase on master: pass sideInputMapping in SimpleDoFnRunner as needed now in the API add 99e081d Fix SpotBugs add 52aae7f [BEAM-8470] simplify coders in combinePerKey translation add 803bb0b [BEAM-8470] Fix combiner. Do not reuse instance of accumulator add a4c4ee7 [BEAM-8470] input windows can arrive exploded (for sliding windows). As a result an input has multiple windows. So we need to consider that the accumulator can have multiple windows add ea47c71 [BEAM-8470] Add a combine test with sliding windows add 89eb1f0 [BEAM-8470] Add a test to test combine translation on binaryCombineFn with sliding windows add 4416a2e [BEAM-8470] Fix tests: use correct SparkStructuredStreamingPipelineOptions, set testMode to true. Some renaming add b980cf9 [BEAM-8470] Fix wrong expected results in CombineTest.testBinaryCombineWithSlidingWindows add 22d98df [BEAM-8470] Add disclaimers about this runner being experimental add d680252 [BEAM-8470] Fix: create an empty accumulator in combine.mergeAccumulators, because this method modifies its first input accumulator. Decrease memory usage by storing only accumulator and timestamp in the combine.merge map add 9922bcb [BEAM-8470] Apply spotless add a0872bc [BEAM-8470] Add a countPerElement test with sliding windows add c9ce072 [BEAM-8470] Fix the output timestamps of combine: timestamps must be merged to a window before combining add f516554 [BEAM-8470] set log level to info to avoid resource consumption in production mode add 27dcc48 [BEAM-8470] Fix CombineTest.testCountPerElementWithSlidingWindows expected results add f5cfbc1 [BEAM-8470] Remove "validatesStructuredStreamingRunnerBatch" from "validatesRunner" task add ac3132c [BEAM-8470] Fix timestamps in combine output: assign the timestamp to the window and not merge all the timestamps before combine add 18059ee Merge pull request #9866: [BEAM-8470] Create a new Spark runner based on Spark Structured streaming framework No new revisions were added by this update. Summary of changes: ...ValidatesRunner_SparkStructuredStreaming.groovy | 43 +++ .../org/apache/beam/gradle/BeamModulePlugin.groovy | 2 + runners/spark/build.gradle | 47 +++ .../runners/spark/SparkCommonPipelineOptions.java | 74 +++++ .../beam/runners/spark/SparkPipelineOptions.java | 47 +-- .../beam/runners/spark/SparkRunnerRegistrar.java | 8 +- .../SparkStructuredStreamingPipelineOptions.java | 41 +++ .../SparkStructuredStreamingPipelineResult.java | 151 ++++++++++ .../SparkStructuredStreamingRunner.java | 221 ++++++++++++++ .../aggregators/AggregatorsAccumulator.java | 70 +++++ .../aggregators/NamedAggregators.java | 110 +++++++ .../aggregators/NamedAggregatorsAccumulator.java | 63 ++++ .../aggregators/package-info.java | 20 ++ .../structuredstreaming/examples/WordCount.java | 132 +++++++++ .../metrics/AggregatorMetric.java | 39 +++ .../metrics/AggregatorMetricSource.java | 49 ++++ .../metrics/CompositeSource.java | 45 +++ .../metrics/MetricsAccumulator.java | 71 +++++ .../MetricsContainerStepMapAccumulator.java | 65 +++++ .../metrics/SparkBeamMetric.java | 89 ++++++ .../metrics/SparkBeamMetricSource.java | 48 +++ .../metrics/SparkMetricsContainerStepMap.java | 42 +++ .../metrics/WithMetricsSupport.java | 177 +++++++++++ .../structuredstreaming/metrics/package-info.java | 20 ++ .../metrics/sink/CodahaleCsvSink.java | 36 +++ .../metrics/sink/CodahaleGraphiteSink.java | 34 +++ .../metrics/sink/package-info.java | 20 ++ .../spark/structuredstreaming/package-info.java | 20 ++ .../translation/PipelineTranslator.java | 215 ++++++++++++++ .../translation/SchemaHelpers.java | 37 +++ .../translation/SparkTransformOverrides.java | 52 ++++ .../translation/TransformTranslator.java | 28 ++ .../translation/TranslationContext.java | 257 ++++++++++++++++ .../translation/batch/AggregatorCombiner.java | 238 +++++++++++++++ .../batch/CombinePerKeyTranslatorBatch.java | 111 +++++++ .../CreatePCollectionViewTranslatorBatch.java | 59 ++++ .../translation/batch/DatasetSourceBatch.java | 161 ++++++++++ .../translation/batch/DoFnFunction.java | 161 ++++++++++ .../translation/batch/DoFnRunnerWithMetrics.java | 96 ++++++ .../translation/batch/FlattenTranslatorBatch.java | 63 ++++ .../batch/GroupByKeyTranslatorBatch.java | 121 ++++++++ .../translation/batch/ParDoTranslatorBatch.java | 251 ++++++++++++++++ .../translation/batch/PipelineTranslatorBatch.java | 94 ++++++ .../translation/batch/ProcessContext.java | 138 +++++++++ .../batch/ReadSourceTranslatorBatch.java | 85 ++++++ .../batch/ReshuffleTranslatorBatch.java | 29 ++ .../batch/WindowAssignTranslatorBatch.java | 58 ++++ .../GroupAlsoByWindowViaOutputBufferFn.java | 165 +++++++++++ .../batch/functions/NoOpStepContext.java | 36 +++ .../batch/functions/SparkSideInputReader.java | 146 ++++++++++ .../translation/batch/functions/package-info.java | 20 ++ .../translation/batch/package-info.java | 20 ++ .../translation/helpers/CoderHelpers.java | 63 ++++ .../translation/helpers/EncoderHelpers.java | 324 +++++++++++++++++++++ .../translation/helpers/KVHelpers.java | 31 ++ .../translation/helpers/MultiOuputCoder.java | 80 +++++ .../translation/helpers/ReduceFnRunnerHelpers.java | 88 ++++++ .../translation/helpers/RowHelpers.java | 75 +++++ .../translation/helpers/SideInputBroadcast.java | 46 +++ .../translation/helpers/WindowingHelpers.java | 82 ++++++ .../translation/helpers/package-info.java | 20 ++ .../translation/package-info.java | 20 ++ .../streaming/DatasetSourceStreaming.java | 258 ++++++++++++++++ .../streaming/PipelineTranslatorStreaming.java | 88 ++++++ .../streaming/ReadSourceTranslatorStreaming.java | 85 ++++++ .../translation/streaming/package-info.java | 20 ++ .../translation/utils/CachedSideInputReader.java | 91 ++++++ .../translation/utils/SideInputStorage.java | 103 +++++++ .../translation/utils/package-info.java | 20 ++ runners/spark/src/main/resources/log4j.properties | 40 +++ .../runners/spark/SparkRunnerRegistrarTest.java | 7 +- .../StructuredStreamingPipelineStateTest.java | 225 ++++++++++++++ .../aggregators/metrics/sink/InMemoryMetrics.java | 82 ++++++ .../metrics/sink/InMemoryMetricsSinkRule.java | 28 ++ .../metrics/sink/SparkMetricsSinkTest.java | 84 ++++++ .../metrics/BeamMetricTest.java | 47 +++ .../metrics/MetricsPusherTest.java | 91 ++++++ .../translation/batch/CombineTest.java | 186 ++++++++++++ .../translation/batch/ComplexSourceTest.java | 86 ++++++ .../translation/batch/FlattenTest.java | 59 ++++ .../translation/batch/GroupByKeyTest.java | 124 ++++++++ .../translation/batch/ParDoTest.java | 153 ++++++++++ .../translation/batch/SimpleSourceTest.java | 101 +++++++ .../translation/batch/WindowAssignTest.java | 69 +++++ .../translation/streaming/SimpleSourceTest.java | 57 ++++ .../structuredstreaming/utils/EncodersTest.java | 52 ++++ .../utils/SerializationDebugger.java | 115 ++++++++ .../structuredstreaming/utils/package-info.java | 20 ++ .../{log4j.properties => log4j-test.properties} | 2 +- sdks/java/testing/nexmark/build.gradle | 3 +- 90 files changed, 7498 insertions(+), 52 deletions(-) create mode 100644 .test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming.groovy create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/SparkCommonPipelineOptions.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkStructuredStreamingPipelineOptions.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkStructuredStreamingPipelineResult.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkStructuredStreamingRunner.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/aggregators/AggregatorsAccumulator.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/aggregators/NamedAggregators.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/aggregators/NamedAggregatorsAccumulator.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/aggregators/package-info.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/examples/WordCount.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/metrics/AggregatorMetric.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/metrics/AggregatorMetricSource.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/metrics/CompositeSource.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/metrics/MetricsAccumulator.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/metrics/MetricsContainerStepMapAccumulator.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/metrics/SparkBeamMetric.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/metrics/SparkBeamMetricSource.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/metrics/SparkMetricsContainerStepMap.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/metrics/WithMetricsSupport.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/metrics/package-info.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/metrics/sink/CodahaleCsvSink.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/metrics/sink/CodahaleGraphiteSink.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/metrics/sink/package-info.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/package-info.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/SchemaHelpers.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/SparkTransformOverrides.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/AggregatorCombiner.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/CombinePerKeyTranslatorBatch.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/CreatePCollectionViewTranslatorBatch.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceBatch.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DoFnFunction.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DoFnRunnerWithMetrics.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/FlattenTranslatorBatch.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/GroupByKeyTranslatorBatch.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ParDoTranslatorBatch.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/PipelineTranslatorBatch.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ProcessContext.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReshuffleTranslatorBatch.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/WindowAssignTranslatorBatch.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/functions/GroupAlsoByWindowViaOutputBufferFn.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/functions/NoOpStepContext.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/functions/SparkSideInputReader.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/functions/package-info.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/package-info.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/CoderHelpers.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/EncoderHelpers.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/KVHelpers.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/MultiOuputCoder.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/ReduceFnRunnerHelpers.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/RowHelpers.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/SideInputBroadcast.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/WindowingHelpers.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/package-info.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/package-info.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/DatasetSourceStreaming.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/PipelineTranslatorStreaming.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/ReadSourceTranslatorStreaming.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/package-info.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/utils/CachedSideInputReader.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/utils/SideInputStorage.java create mode 100644 runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/utils/package-info.java create mode 100644 runners/spark/src/main/resources/log4j.properties create mode 100644 runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/StructuredStreamingPipelineStateTest.java create mode 100644 runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/aggregators/metrics/sink/InMemoryMetrics.java create mode 100644 runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/aggregators/metrics/sink/InMemoryMetricsSinkRule.java create mode 100644 runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/aggregators/metrics/sink/SparkMetricsSinkTest.java create mode 100644 runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/metrics/BeamMetricTest.java create mode 100644 runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/metrics/MetricsPusherTest.java create mode 100644 runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/CombineTest.java create mode 100644 runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ComplexSourceTest.java create mode 100644 runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/FlattenTest.java create mode 100644 runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/GroupByKeyTest.java create mode 100644 runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ParDoTest.java create mode 100644 runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/SimpleSourceTest.java create mode 100644 runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/WindowAssignTest.java create mode 100644 runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/SimpleSourceTest.java create mode 100644 runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/utils/EncodersTest.java create mode 100644 runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/utils/SerializationDebugger.java create mode 100644 runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/utils/package-info.java rename runners/spark/src/test/resources/{log4j.properties => log4j-test.properties} (96%)