This is an automated email from the ASF dual-hosted git repository.

aromanenko pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


    from 0b415fd  Merge pull request #10160 More compartmentalization of 
bundle-based-runner only utilities.
     add 6f72c93  [BEAM-8470] Add an empty spark-structured-streaming runner 
project targeting spark 2.4.0
     add 4ca7e55  [BEAM-8470] Fix missing dep
     add 0fc0d0a4 [BEAM-8470] Add SparkPipelineOptions
     add e8ca23e  [BEAM-8470] Start pipeline translation
     add 00964d2  [BEAM-8470] Add global pipeline translation structure
     add a3b278e  [BEAM-8470] Add nodes translators structure
     add ef4941a  [BEAM-8470] Wire node translators with pipeline translator
     add 8a8dc1e  [BEAM-8470] Renames: better differenciate pipeline translator 
for transform translator
     add cdfd589  [BEAM-8470] Organise methods in PipelineTranslator
     add 38eca95  [BEAM-8470] Initialise BatchTranslationContext
     add 80f2d8c  [BEAM-8470] Refactoring: -move batch/streaming common 
translation visitor and utility methods to PipelineTranslator -rename batch 
dedicated classes to Batch* to differentiate with their streaming counterparts 
-Introduce TranslationContext for common batch/streaming components
     add baf210f  [BEAM-8470] Make transform translation clearer: renaming, 
comments
     add b65a9da  [BEAM-8470] Improve javadocs
     add 0434749  [BEAM-8470] Move SparkTransformOverrides to correct package
     add 4372c7e  [BEAM-8470] Move common translation context components to 
superclass
     add 49b666b  [BEAM-8470] apply spotless
     add 0d6906a  [BEAM-8470] Make codestyle and firebug happy
     add ef97440  [BEAM-8470] Add TODOs
     add 9abf8ac  [BEAM-8470] Post-pone batch qualifier in all classes names 
for readability
     add 11a6e19  [BEAM-8470] Add precise TODO for multiple TransformTranslator 
per transform URN
     add 47ed3d1  [BEAM-8470] Added SparkRunnerRegistrar
     add 022a0d0  [BEAM-8470] Add basic pipeline execution. Refactor 
translatePipeline() to return the translationContext on which we can run 
startPipeline()
     add 96b3f36  [BEAM-8470] Create PCollections manipulation methods
     add b0c42af  [BEAM-8470] Create Datasets manipulation methods
     add 2c5cb23  [BEAM-8470] Add Flatten transformation translator
     add 9f1bf60  [BEAM-8470] Add primitive GroupByKeyTranslatorBatch 
implementation
     add 98ea9fb  [BEAM-8470] Use Iterators.transform() to return Iterable
     add 0b55323  [BEAM-8470] Implement read transform
     add 4c91a57  [BEAM-8470] update TODO
     add 4adf3bb  [BEAM-8470] Apply spotless
     add 6b4b916  [BEAM-8470] start source instanciation
     add ff60578  [BEAM-8470] Improve exception flow
     add 2ee98da  [BEAM-8470] Improve type enforcement in ReadSourceTranslator
     add fc3abf5  [BEAM-8470] Experiment over using spark Catalog to pass in 
Beam Source through spark Table
     add 4746d9b  [BEAM-8470] Add source mocks
     add e45e48d  [BEAM-8470] fix mock, wire mock in translators and create a 
main test.
     add 7d7fe77  [BEAM-8470] Use raw WindowedValue so that spark Encoders 
could work (temporary)
     add 9d84a0f  [BEAM-8470] clean deps
     add b4032aa  [BEAM-8470] Move DatasetSourceMock to proper batch mode
     add a7ad1ab  [BEAM-8470] Run pipeline in batch mode or in streaming mode
     add 141e4bc  [BEAM-8470] Split batch and streaming sources and translators
     add 8954c50  [BEAM-8470] Use raw Encoder<WindowedValue> also in regular 
ReadSourceTranslatorBatch
     add 00ef268  [BEAM-8470] Clean
     add c426c98  [BEAM-8470] Add ReadSourceTranslatorStreaming
     add 1740dc4  [BEAM-8470] Move Source and translator mocks to a mock 
package.
     add 7819918  [BEAM-8470] Pass Beam Source and PipelineOptions to the spark 
DataSource as serialized strings
     add b10aa53  [BEAM-8470] Refactor DatasetSource fields
     add c4bb08c  [BEAM-8470] Wire real SourceTransform and not mock and update 
the test
     add 5bbea63  [BEAM-8470] Add missing 0-arg public constructor
     add 0dbe26f  [BEAM-8470] Use new PipelineOptionsSerializationUtils
     add 43052d3  [BEAM-8470] Apply spotless and fix  checkstyle
     add 17ca18b  [BEAM-8470] Add a dummy schema for reader
     add 2e8393b  [BEAM-8470] Add empty 0-arg constructor for mock source
     add 43ff919  [BEAM-8470] Clean
     add c8ad727  [BEAM-8470] Checkstyle and Findbugs
     add 6e3575d  [BEAM-8470] Refactor SourceTest to a UTest instaed of a main
     add c221aaa  [BEAM-8470] Fix pipeline triggering: use a spark action 
instead of writing the dataset
     add c26c421  [BEAM-8470] improve readability of options passing to the 
source
     add ff69ded  [BEAM-8470] Clean unneeded fields in DatasetReader
     add 524667e  [BEAM-8470] Fix serialization issues
     add 638bdae  [BEAM-8470] Add SerializationDebugger
     add fd354fa  [BEAM-8470] Add serialization test
     add 163102b  [BEAM-8470] Move SourceTest to same package as tested class
     add 68fc6d5  [BEAM-8470] Fix SourceTest
     add e4c76fc  [BEAM-8470] Simplify beam reader creation as it created once 
the source as already been partitioned
     add 90463a0  [BEAM-8470] Put all transform translators Serializable
     add 940b484  [BEAM-8470] Enable test mode
     add 0b156bf  [BEAM-8470] Enable gradle build scan
     add 74080c1  [BEAM-8470] Add flatten test
     add 740131e  [BEAM-8470] First attempt for ParDo primitive implementation
     add 0cedcd7  [BEAM-8470] Serialize windowedValue to byte[] in source to be 
able to specify a binary dataset schema and deserialize windowedValue from Row 
to get a dataset<WindowedValue>
     add 79b075a  [BEAM-8470] Comment schema choices
     add 5d1b2b5  [BEAM-8470] Fix errorprone
     add 0529996  [BEAM-8470] Fix testMode output to comply with new binary 
schema
     add 696597c  [BEAM-8470] Cleaning
     add 91964b9  [BEAM-8470] Remove bundleSize parameter and always use spark 
default parallelism
     add 35051e0  [BEAM-8470] Fix split bug
     add 6902b4b  [BEAM-8470] Clean
     add e5a36f0  [BEAM-8470] Add ParDoTest
     add 057e0ac  [BEAM-8470] Address minor review notes
     add b706a74  [BEAM-8470] Clean
     add 83779db  [BEAM-8470] Add GroupByKeyTest
     add 0330f32  [BEAM-8470] Add comments and TODO to GroupByKeyTranslatorBatch
     add 1cb53e3  [BEAM-8470] Fix type checking with Encoder of WindowedValue<T>
     add e363586  [BEAM-8470] Port latest changes of ReadSourceTranslatorBatch 
to ReadSourceTranslatorStreaming
     add 9b9cca6  [BEAM-8470] Remove no more needed putDatasetRaw
     add cb1a085  [BEAM-8470] Add ComplexSourceTest
     add c13e8ae  [BEAM-8470] Fail in case of having SideInouts or State/Timers
     add 83e4067  [BEAM-8470] Fix Encoders: create an Encoder for every 
manipulated type to avoid Spark fallback to genericRowWithSchema and cast to 
allow having Encoders with generic types such as WindowedValue<T> and get the 
type checking back
     add d19c8b9  [BEAM-8470] Apply spotless
     add 1051eba  [BEAM-8470] Fixed Javadoc error
     add ee67f21  [BEAM-8470] Rename SparkSideInputReader class and rename 
pruneOutput() to pruneOutputFilteredByTag()
     add 8e0f183  [BEAM-8470] Don't use deprecated 
sideInput.getWindowingStrategyInternal()
     add f0ff5b4  [BEAM-8470] Simplify logic of ParDo translator
     add 9c4e36d  [BEAM-8470] Fix kryo issue in GBK translator with a workaround
     add b3a740c  [BEAM-8470] Rename SparkOutputManager for consistency
     add 438f1e9  [BEAM-8470] Fix for test elements container in GroupByKeyTest
     add be1cd23  [BEAM-8470] Added "testTwoPardoInRow"
     add ed6b12c  [BEAM-8470] Add a test for the most simple possible Combine
     add 35f100c  [BEAM-8470] Rename SparkDoFnFilterFunction to 
DoFnFilterFunction for consistency
     add 8111241  [BEAM-8470] Generalize the use of SerializablePipelineOptions 
in place of (not serializable) PipelineOptions
     add 2d0c33e  [BEAM-8470] Fix getSideInputs
     add 8662ef8  [BEAM-8470] Extract binary schema creation in a helper class
     add 9a757d0  [BEAM-8470] First version of combinePerKey
     add 3da0276  [BEAM-8470] Improve type checking of Tuple2 encoder
     add ca39fe1  [BEAM-8470] Introduce WindowingHelpers (and helpers package) 
and use it in Pardo, GBK and CombinePerKey
     add f002039  [BEAM-8470] Fix combiner using KV as input, use binary 
encoders in place of accumulatorEncoder and outputEncoder, use helpers, spotless
     add cec975f  [BEAM-8470] Add combinePerKey and CombineGlobally tests
     add 39a978a  [BEAM-8470] Introduce RowHelpers
     add c3cd76f  [BEAM-8470] Add CombineGlobally translation to avoid 
translating Combine.perKey as a composite transform based on Combine.PerKey 
(which uses low perf GBK)
     add 9ee65d5  [BEAM-8470] Cleaning
     add 279d2ca  [BEAM-8470] Get back to classes in translators resolution 
because URNs cannot translate Combine.Globally
     add 55bf503  [BEAM-8470] Fix various type checking issues in 
Combine.Globally
     add d5dab93  [BEAM-8470] Update test with Long
     add fd1d8c5  [BEAM-8470] Fix combine. For unknown reason 
GenericRowWithSchema is used as input of combine so extract its content to be 
able to proceed
     add 5c074eb  [BEAM-8470] Use more generic Row instead of 
GenericRowWithSchema
     add 341c39f  [BEAM-8470] Add explanation about receiving a Row as input in 
the combiner
     add 89d38b1  [BEAM-8470] Fix encoder bug in combinePerkey
     add 6b109d3  [BEAM-8470] Cleaning
     add 9494886  [BEAM-8470] Implement WindowAssignTranslatorBatch
     add c83a607  [BEAM-8470] Implement WindowAssignTest
     add 9bf9385  [BEAM-8470] Fix javadoc
     add 8f5e9dc  [BEAM-8470] Added SideInput support
     add 915ca01  [BEAM-8470] Fix CheckStyle violations
     add 92cab8e  [BEAM-8470] Don't use Reshuffle translation
     add 6bbeeff  [BEAM-8470] Added using CachedSideInputReader
     add f8512c7  [BEAM-8470] Added TODO comment for ReshuffleTranslatorBatch
     add 19ee565  [BEAM-8470] And unchecked warning suppression
     add 1bf9a4a  [BEAM-8470] Add streaming source initialisation
     add 6ca706a  [BEAM-8470] Implement first streaming source
     add b5ccec6  [BEAM-8470] Add a TODO on spark output modes
     add 175940d  [BEAM-8470] Add transformators registry in 
PipelineTranslatorStreaming
     add ba59f10  [BEAM-8470] Add source streaming test
     add 80dde2d  [BEAM-8470] Specify checkpointLocation at the pipeline start
     add e494189  [BEAM-8470] Clean unneeded 0 arg constructor in batch source
     add e92adee  [BEAM-8470] Clean streaming source
     add 31a2be5  [BEAM-8470] Continue impl of offsets for streaming source
     add 8777607  [BEAM-8470] Deal with checkpoint and offset based read
     add 43088fb  [BEAM-8470] Apply spotless and fix spotbugs warnings
     add 75e3f7f  [BEAM-8470] Disable never ending test 
SimpleSourceTest.testUnboundedSource
     add 9e60f27  [BEAM-8470] Fix access level issues, typos and modernize code 
to Java 8 style
     add 0492206  [BEAM-8470] Merge Spark Structured Streaming runner into main 
Spark module. Remove spark-structured-streaming module.Rename Runner to 
SparkStructuredStreamingRunner. Remove specific PipelineOotions and Registrar 
for Structured Streaming Runner
     add 0e37bae  [BEAM-8470] Fix non-vendored imports from Spark Streaming 
Runner classes
     add 01a2fd4  [BEAM-8470] Pass doFnSchemaInformation to ParDo batch 
translation
     add f950dbf  [BEAM-8470] Fix spotless issues after rebase
     add 82cf713  [BEAM-8470] Fix logging levels in Spark Structured Streaming 
translation
     add f44c863  [BEAM-8470] Add SparkStructuredStreamingPipelineOptions and 
SparkCommonPipelineOptions - SparkStructuredStreamingPipelineOptions was added 
to have the new   runner rely only on its specific options.
     add d0f2ba3  [BEAM-8470] Rename SparkPipelineResult to 
SparkStructuredStreamingPipelineResult This is done to avoid an eventual 
collision with the one in SparkRunner. However this cannot happen at this 
moment because it is package private, so it is also done for consistency.
     add 23e3c53  [BEAM-8470] Use PAssert in Spark Structured Streaming 
transform tests
     add 3ba3e51  [BEAM-8470] Ignore spark offsets (cf javadoc)
     add 05266b9  [BEAM-8470] implement source.stop
     add 75209f9  [BEAM-8470] Update javadoc
     add f11e382  [BEAM-8470] Apply Spotless
     add 075a79b  [BEAM-8470] Enable batch Validates Runner tests for 
Structured Streaming Runner
     add de5b17b  [BEAM-8470] Limit the number of partitions to make tests go 
300% faster
     add 71edc3e  [BEAM-8470] Fixes ParDo not calling setup and not tearing 
down if exception on startBundle
     add 5972c51  [BEAM-8470] Pass transform based doFnSchemaInformation in 
ParDo translation
     add 3dbcaba  [BEAM-8470] Consider null object case on RowHelpers, fixes 
empty side inputs tests.
     add a687d5d  [BEAM-8470] Put back batch/simpleSourceTest.testBoundedSource
     add b70f605  [BEAM-8470] Update windowAssignTest
     add e070a69  [BEAM-8470] Add comment about checkpoint mark
     add 2c1437c  [BEAM-8470] Re-code GroupByKeyTranslatorBatch to conserve 
windowing instead of unwindowing/windowing(GlobalWindow): simplify code, use 
ReduceFnRunner to merge the windows
     add 01184ff  [BEAM-8470] re-enable reduceFnRunner timers for output
     add dfed0a5  [BEAM-8470] Improve visibility of debug messages
     add c7dd248  [BEAM-8470] Add a test that GBK preserves windowing
     add 65e1b79  [BEAM-8470] Add TODO in Combine translations
     add 346f2f4  [BEAM-8470] Update KVHelpers.extractKey() to deal with 
WindowedValue and update GBK and CPK
     add c9f4525  [BEAM-8470] Fix comment about schemas
     add b99c9ef  [BEAM-8470] Implement reduce part of CombineGlobally 
translation with windowing
     add 1b6339e  [BEAM-8470] Output data after combine
     add 66f6021  [BEAM-8470] Implement merge accumulators part of 
CombineGlobally translation with windowing
     add 6168070  [BEAM-8470] Fix encoder in combine call
     add 0c00e2f  [BEAM-8470] Revert extractKey while combinePerKey is not done 
(so that it compiles)
     add f9ecf1f  [BEAM-8470] Apply a groupByKey avoids for some reason that 
the spark structured streaming fmwk casts data to Row which makes it impossible 
to deserialize without the coder shipped into the data. For performance reasons 
(avoid memory consumption and having to deserialize), we do not ship coder + 
data. Also add a mapparitions before GBK to avoid shuffling
     add abaf140  [BEAM-8470] Fix case when a window does not merge into any 
other window
     add 319e01f  [BEAM-8470] Fix wrong encoder in combineGlobally GBK
     add a759ddc  [BEAM-8470] Fix bug in the window merging logic
     add d609213  [BEAM-8470] Remove the mapPartition that adds a key per 
partition because otherwise spark will reduce values per key instead of globally
     add 2cbd35d  [BEAM-8470] Remove CombineGlobally translation because it is 
less performant than the beam sdk one (key + combinePerKey.withHotkeyFanout)
     add 24c137a  [BEAM-8470] Now that there is only Combine.PerKey 
translation, make only one Aggregator
     add b7e4a8b  [BEAM-8470] Clean no more needed KVHelpers
     add 5215a5a  [BEAM-8470] Clean not more needed RowHelpers
     add 4e06ffe  [BEAM-8470] Clean not more needed WindowingHelpers
     add c83168c  [BEAM-8470] Fix javadoc of AggregatorCombiner
     add 5898cf0  [BEAM-8470] Fixed immutable list bug
     add 37b5fa8  [BEAM-8470] add comment in combine globally test
     add 6eea481  [BEAM-8470] Clean groupByKeyTest
     add 3e33078  [BEAM-8470] Add a test that combine per key preserves 
windowing
     add 55d865b  [BEAM-8470] Ignore for now not working test 
testCombineGlobally
     add 65ffdd8  [BEAM-8470] Add metrics support in DoFn
     add e2b57ac  [BEAM-8470] Add missing dependencies to run Spark Structured 
Streaming Runner on Nexmark
     add 323d10f  [BEAM-8470] Add setEnableSparkMetricSinks() method
     add 2f1d38b  [BEAM-8470] Fix javadoc
     add 06559d2  [BEAM-8470] Fix accumulators initialization in Combine that 
prevented CombineGlobally to work.
     add 3cce68c  [BEAM-8470] Add a test to check that CombineGlobally 
preserves windowing
     add 68b6f98  [BEAM-8470] Persist all output Dataset if there are multiple 
outputs in pipeline Enabled Use*Metrics tests
     add 6084764  [BEAM-8470] Added metrics sinks and tests
     add 69c514f  [BEAM-8470] Make spotless happy
     add 4cab501  [BEAM-8470] Add PipelineResults to Spark structured streaming.
     add a1ec37a  [BEAM-8470] Update log4j configuration
     add 2a53b32  [BEAM-8470] Add spark execution plans extended debug messages.
     add f34de05  [BEAM-8470] Print number of leaf datasets
     add 892b742  [BEAM-8470] fixup! Add PipelineResults to Spark structured 
streaming.
     add 613bac9  [BEAM-8470] Remove no more needed AggregatorCombinerPerKey 
(there is only AggregatorCombiner)
     add d024335  [BEAM-8470] After testing performance and correctness, launch 
pipeline with dataset.foreach(). Make both test mode and production mode use 
foreach for uniformity. Move dataset print as a utility method
     add 7a30c5a  [BEAM-8470] Improve Pardo translation performance: avoid 
calling a filter transform when there is only one output tag
     add 0e42c18  [BEAM-8470] Use "sparkMaster" in local mode to obtain number 
of shuffle partitions + spotless apply
     add 7f08fc8  [BEAM-8470] Wrap Beam Coders into Spark Encoders using 
ExpressionEncoder: serialization part
     add 2071f1c  [BEAM-8470] Wrap Beam Coders into Spark Encoders using 
ExpressionEncoder: deserialization part
     add 6f0e9fa  [BEAM-8470] type erasure: spark encoders require a Class<T>, 
pass Object and cast to Class<T>
     add 965814a  [BEAM-8470] Fix scala Product in Encoders to avoid 
StackEverflow
     add 165d8b7  [BEAM-8470] Conform to spark ExpressionEncoders: pass 
classTags, implement scala Product, pass children from within the 
ExpressionEncoder, fix visibilities
     add 609aace  [BEAM-8470] Add a simple spark native test to test Beam 
coders wrapping into Spark Encoders
     add 09e882e  [BEAM-8470] Fix code generation in Beam coder wrapper
     add d715e37  [BEAM-8470] Lazy init coder because coder instance cannot be 
interpolated by catalyst
     add d27962c  [BEAM-8470] Fix warning in coder construction by reflexion
     add fe46a63  [BEAM-8470] Fix ExpressionEncoder generated code: typos, try 
catch, fqcn
     add 64a978d  [BEAM-8470] Fix getting the output value in code generation
     add cfecb40  [BEAM-8470] Fix beam coder lazy init using reflexion: use 
.clas + try catch + cast
     add 0d13b77  [BEAM-8470] Remove lazy init of beam coder because there is 
no generic way on instanciating a beam coder
     add 1a6e662  [BEAM-8470] Remove example code
     add e45e7b5  [BEAM-8470] Fix equal and hashcode
     add dd63c1b  [BEAM-8470] Fix generated code: uniform exceptions catching, 
fix parenthesis and variable declarations
     add 75a8b24  [BEAM-8470] Add an assert of equality in the encoders test
     add ff50364  [BEAM-8470] Apply spotless and checkstyle and add javadocs
     add 4e215e0  [BEAM-8470] Wrap exceptions in UserCoderExceptions
     add 8ed3db6  [BEAM-8470] Put Encoders expressions serializable
     add 465d570  [BEAM-8470] Catch Exception instead of IOException because 
some coders to not throw Exceptions at all (e.g.VoidCoder)
     add 4579663  [BEAM-8470] Apply new Encoders to CombinePerKey
     add d246520  [BEAM-8470] Apply new Encoders to Read source
     add a448cfb  [BEAM-8470] Improve performance of source: the mapper already 
calls windowedValueCoder.decode, no need to call it also in the Spark encoder
     add c875dbc  [BEAM-8470] Ignore long time failing test: 
SparkMetricsSinkTest
     add be7415d  [BEAM-8470] Apply new Encoders to Window assign translation
     add cdd6e1a  [BEAM-8470] Apply new Encoders to AggregatorCombiner
     add 9ae6a57  [BEAM-8470] Create a Tuple2Coder to encode scala tuple2
     add f827291  [BEAM-8470] Apply new Encoders to GroupByKey
     add 76125e9  [BEAM-8470] Apply new Encoders to Pardo. Replace Tuple2Coder 
with MultiOutputCoder to deal with multiple output to use in Spark Encoder for 
DoFnRunner
     add f492ccf  [BEAM-8470] Apply spotless, fix typo and javadoc
     add d81d2f6  [BEAM-8470] Use beam encoders also in the output of the 
source translation
     add 9599546  [BEAM-8470] Remove unneeded cast
     add 0e072d3  [BEAM-8470] Fix: Remove generic hack of using object. Use 
actual Coder encodedType in Encoders
     add da8dfbd  [BEAM-8470] Remove Encoders based on kryo now that we call 
Beam coders in the runner
     add 470d04f  [BEAM-8470] Add a jenkins job for validates runner tests in 
the new spark runner
     add 17ace84  [BEAM-8470] Apply spotless
     add 2ca30a4  [BEAM-8470] Rebase on master: pass sideInputMapping in 
SimpleDoFnRunner as needed now in the API
     add 99e081d  Fix SpotBugs
     add 52aae7f  [BEAM-8470] simplify coders in combinePerKey translation
     add 803bb0b  [BEAM-8470] Fix combiner. Do not reuse instance of accumulator
     add a4c4ee7  [BEAM-8470] input windows can arrive exploded (for sliding 
windows). As a result an input has multiple windows. So we need to consider 
that the accumulator can have multiple windows
     add ea47c71  [BEAM-8470] Add a combine test with sliding windows
     add 89eb1f0  [BEAM-8470] Add a test to test combine translation on 
binaryCombineFn with sliding windows
     add 4416a2e  [BEAM-8470] Fix tests: use correct 
SparkStructuredStreamingPipelineOptions, set testMode to true. Some renaming
     add b980cf9  [BEAM-8470] Fix wrong expected results in 
CombineTest.testBinaryCombineWithSlidingWindows
     add 22d98df  [BEAM-8470] Add disclaimers about this runner being 
experimental
     add d680252  [BEAM-8470] Fix: create an empty accumulator in 
combine.mergeAccumulators, because this method modifies its first input 
accumulator. Decrease memory usage by storing only accumulator and timestamp in 
the combine.merge map
     add 9922bcb  [BEAM-8470] Apply spotless
     add a0872bc  [BEAM-8470] Add a countPerElement test with sliding windows
     add c9ce072  [BEAM-8470] Fix the output timestamps of combine: timestamps 
must be merged to a window before combining
     add f516554  [BEAM-8470] set log level to info to avoid resource 
consumption in production mode
     add 27dcc48  [BEAM-8470] Fix 
CombineTest.testCountPerElementWithSlidingWindows expected results
     add f5cfbc1  [BEAM-8470] Remove "validatesStructuredStreamingRunnerBatch" 
from "validatesRunner" task
     add ac3132c  [BEAM-8470] Fix timestamps in combine output: assign the 
timestamp to the window and not merge all the timestamps before combine
     add 18059ee  Merge pull request #9866: [BEAM-8470] Create a new Spark 
runner based on Spark Structured streaming framework

No new revisions were added by this update.

Summary of changes:
 ...ValidatesRunner_SparkStructuredStreaming.groovy |  43 +++
 .../org/apache/beam/gradle/BeamModulePlugin.groovy |   2 +
 runners/spark/build.gradle                         |  47 +++
 .../runners/spark/SparkCommonPipelineOptions.java  |  74 +++++
 .../beam/runners/spark/SparkPipelineOptions.java   |  47 +--
 .../beam/runners/spark/SparkRunnerRegistrar.java   |   8 +-
 .../SparkStructuredStreamingPipelineOptions.java   |  41 +++
 .../SparkStructuredStreamingPipelineResult.java    | 151 ++++++++++
 .../SparkStructuredStreamingRunner.java            | 221 ++++++++++++++
 .../aggregators/AggregatorsAccumulator.java        |  70 +++++
 .../aggregators/NamedAggregators.java              | 110 +++++++
 .../aggregators/NamedAggregatorsAccumulator.java   |  63 ++++
 .../aggregators/package-info.java                  |  20 ++
 .../structuredstreaming/examples/WordCount.java    | 132 +++++++++
 .../metrics/AggregatorMetric.java                  |  39 +++
 .../metrics/AggregatorMetricSource.java            |  49 ++++
 .../metrics/CompositeSource.java                   |  45 +++
 .../metrics/MetricsAccumulator.java                |  71 +++++
 .../MetricsContainerStepMapAccumulator.java        |  65 +++++
 .../metrics/SparkBeamMetric.java                   |  89 ++++++
 .../metrics/SparkBeamMetricSource.java             |  48 +++
 .../metrics/SparkMetricsContainerStepMap.java      |  42 +++
 .../metrics/WithMetricsSupport.java                | 177 +++++++++++
 .../structuredstreaming/metrics/package-info.java  |  20 ++
 .../metrics/sink/CodahaleCsvSink.java              |  36 +++
 .../metrics/sink/CodahaleGraphiteSink.java         |  34 +++
 .../metrics/sink/package-info.java                 |  20 ++
 .../spark/structuredstreaming/package-info.java    |  20 ++
 .../translation/PipelineTranslator.java            | 215 ++++++++++++++
 .../translation/SchemaHelpers.java                 |  37 +++
 .../translation/SparkTransformOverrides.java       |  52 ++++
 .../translation/TransformTranslator.java           |  28 ++
 .../translation/TranslationContext.java            | 257 ++++++++++++++++
 .../translation/batch/AggregatorCombiner.java      | 238 +++++++++++++++
 .../batch/CombinePerKeyTranslatorBatch.java        | 111 +++++++
 .../CreatePCollectionViewTranslatorBatch.java      |  59 ++++
 .../translation/batch/DatasetSourceBatch.java      | 161 ++++++++++
 .../translation/batch/DoFnFunction.java            | 161 ++++++++++
 .../translation/batch/DoFnRunnerWithMetrics.java   |  96 ++++++
 .../translation/batch/FlattenTranslatorBatch.java  |  63 ++++
 .../batch/GroupByKeyTranslatorBatch.java           | 121 ++++++++
 .../translation/batch/ParDoTranslatorBatch.java    | 251 ++++++++++++++++
 .../translation/batch/PipelineTranslatorBatch.java |  94 ++++++
 .../translation/batch/ProcessContext.java          | 138 +++++++++
 .../batch/ReadSourceTranslatorBatch.java           |  85 ++++++
 .../batch/ReshuffleTranslatorBatch.java            |  29 ++
 .../batch/WindowAssignTranslatorBatch.java         |  58 ++++
 .../GroupAlsoByWindowViaOutputBufferFn.java        | 165 +++++++++++
 .../batch/functions/NoOpStepContext.java           |  36 +++
 .../batch/functions/SparkSideInputReader.java      | 146 ++++++++++
 .../translation/batch/functions/package-info.java  |  20 ++
 .../translation/batch/package-info.java            |  20 ++
 .../translation/helpers/CoderHelpers.java          |  63 ++++
 .../translation/helpers/EncoderHelpers.java        | 324 +++++++++++++++++++++
 .../translation/helpers/KVHelpers.java             |  31 ++
 .../translation/helpers/MultiOuputCoder.java       |  80 +++++
 .../translation/helpers/ReduceFnRunnerHelpers.java |  88 ++++++
 .../translation/helpers/RowHelpers.java            |  75 +++++
 .../translation/helpers/SideInputBroadcast.java    |  46 +++
 .../translation/helpers/WindowingHelpers.java      |  82 ++++++
 .../translation/helpers/package-info.java          |  20 ++
 .../translation/package-info.java                  |  20 ++
 .../streaming/DatasetSourceStreaming.java          | 258 ++++++++++++++++
 .../streaming/PipelineTranslatorStreaming.java     |  88 ++++++
 .../streaming/ReadSourceTranslatorStreaming.java   |  85 ++++++
 .../translation/streaming/package-info.java        |  20 ++
 .../translation/utils/CachedSideInputReader.java   |  91 ++++++
 .../translation/utils/SideInputStorage.java        | 103 +++++++
 .../translation/utils/package-info.java            |  20 ++
 runners/spark/src/main/resources/log4j.properties  |  40 +++
 .../runners/spark/SparkRunnerRegistrarTest.java    |   7 +-
 .../StructuredStreamingPipelineStateTest.java      | 225 ++++++++++++++
 .../aggregators/metrics/sink/InMemoryMetrics.java  |  82 ++++++
 .../metrics/sink/InMemoryMetricsSinkRule.java      |  28 ++
 .../metrics/sink/SparkMetricsSinkTest.java         |  84 ++++++
 .../metrics/BeamMetricTest.java                    |  47 +++
 .../metrics/MetricsPusherTest.java                 |  91 ++++++
 .../translation/batch/CombineTest.java             | 186 ++++++++++++
 .../translation/batch/ComplexSourceTest.java       |  86 ++++++
 .../translation/batch/FlattenTest.java             |  59 ++++
 .../translation/batch/GroupByKeyTest.java          | 124 ++++++++
 .../translation/batch/ParDoTest.java               | 153 ++++++++++
 .../translation/batch/SimpleSourceTest.java        | 101 +++++++
 .../translation/batch/WindowAssignTest.java        |  69 +++++
 .../translation/streaming/SimpleSourceTest.java    |  57 ++++
 .../structuredstreaming/utils/EncodersTest.java    |  52 ++++
 .../utils/SerializationDebugger.java               | 115 ++++++++
 .../structuredstreaming/utils/package-info.java    |  20 ++
 .../{log4j.properties => log4j-test.properties}    |   2 +-
 sdks/java/testing/nexmark/build.gradle             |   3 +-
 90 files changed, 7498 insertions(+), 52 deletions(-)
 create mode 100644 
.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming.groovy
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/SparkCommonPipelineOptions.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkStructuredStreamingPipelineOptions.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkStructuredStreamingPipelineResult.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/SparkStructuredStreamingRunner.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/aggregators/AggregatorsAccumulator.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/aggregators/NamedAggregators.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/aggregators/NamedAggregatorsAccumulator.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/aggregators/package-info.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/examples/WordCount.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/metrics/AggregatorMetric.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/metrics/AggregatorMetricSource.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/metrics/CompositeSource.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/metrics/MetricsAccumulator.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/metrics/MetricsContainerStepMapAccumulator.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/metrics/SparkBeamMetric.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/metrics/SparkBeamMetricSource.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/metrics/SparkMetricsContainerStepMap.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/metrics/WithMetricsSupport.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/metrics/package-info.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/metrics/sink/CodahaleCsvSink.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/metrics/sink/CodahaleGraphiteSink.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/metrics/sink/package-info.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/package-info.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/PipelineTranslator.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/SchemaHelpers.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/SparkTransformOverrides.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TransformTranslator.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/TranslationContext.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/AggregatorCombiner.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/CombinePerKeyTranslatorBatch.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/CreatePCollectionViewTranslatorBatch.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DatasetSourceBatch.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DoFnFunction.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/DoFnRunnerWithMetrics.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/FlattenTranslatorBatch.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/GroupByKeyTranslatorBatch.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ParDoTranslatorBatch.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/PipelineTranslatorBatch.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ProcessContext.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReadSourceTranslatorBatch.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ReshuffleTranslatorBatch.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/WindowAssignTranslatorBatch.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/functions/GroupAlsoByWindowViaOutputBufferFn.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/functions/NoOpStepContext.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/functions/SparkSideInputReader.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/functions/package-info.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/package-info.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/CoderHelpers.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/EncoderHelpers.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/KVHelpers.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/MultiOuputCoder.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/ReduceFnRunnerHelpers.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/RowHelpers.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/SideInputBroadcast.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/WindowingHelpers.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/helpers/package-info.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/package-info.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/DatasetSourceStreaming.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/PipelineTranslatorStreaming.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/ReadSourceTranslatorStreaming.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/package-info.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/utils/CachedSideInputReader.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/utils/SideInputStorage.java
 create mode 100644 
runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/utils/package-info.java
 create mode 100644 runners/spark/src/main/resources/log4j.properties
 create mode 100644 
runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/StructuredStreamingPipelineStateTest.java
 create mode 100644 
runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/aggregators/metrics/sink/InMemoryMetrics.java
 create mode 100644 
runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/aggregators/metrics/sink/InMemoryMetricsSinkRule.java
 create mode 100644 
runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/aggregators/metrics/sink/SparkMetricsSinkTest.java
 create mode 100644 
runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/metrics/BeamMetricTest.java
 create mode 100644 
runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/metrics/MetricsPusherTest.java
 create mode 100644 
runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/CombineTest.java
 create mode 100644 
runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ComplexSourceTest.java
 create mode 100644 
runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/FlattenTest.java
 create mode 100644 
runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/GroupByKeyTest.java
 create mode 100644 
runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/ParDoTest.java
 create mode 100644 
runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/SimpleSourceTest.java
 create mode 100644 
runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/WindowAssignTest.java
 create mode 100644 
runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/translation/streaming/SimpleSourceTest.java
 create mode 100644 
runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/utils/EncodersTest.java
 create mode 100644 
runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/utils/SerializationDebugger.java
 create mode 100644 
runners/spark/src/test/java/org/apache/beam/runners/spark/structuredstreaming/utils/package-info.java
 rename runners/spark/src/test/resources/{log4j.properties => 
log4j-test.properties} (96%)

Reply via email to