[beam] branch spark-runner_structured-streaming updated (620a27a -> a0e6ca4)

echauchot Thu, 24 Oct 2019 08:26:38 -0700

This is an automated email from the ASF dual-hosted git repository.

echauchot pushed a change to branch spark-runner_structured-streaming
in repository https://gitbox.apache.org/repos/asf/beam.git.



 discard 620a27a  Remove Encoders based on kryo now that we call Beam coders in 
the runner
 discard 824b344  Fix: Remove generic hack of using object. Use actual Coder 
encodedType in Encoders
 discard 27ef6de  Remove unneeded cast
 discard 6a27839  Use beam encoders also in the output of the source translation
 discard 62a87b6  Apply spotless, fix typo and javadoc
 discard c5e78a0  Apply new Encoders to Pardo. Replace Tuple2Coder with 
MultiOutputCoder to deal with multiple output to use in Spark Encoder for 
DoFnRunner
 discard 039f58a  Apply new Encoders to GroupByKey
 discard 21accab  Create a Tuple2Coder to encode scala tuple2
 discard 29f7e93  Apply new Encoders to AggregatorCombiner
 discard 7f1060a  Apply new Encoders to Window assign translation
 discard c48d032  Ignore long time failing test: SparkMetricsSinkTest
 discard 68d3d67  Improve performance of source: the mapper already calls 
windowedValueCoder.decode, no need to call it also in the Spark encoder
 discard 3cc256e  Apply new Encoders to Read source
 discard 7d456b4  Apply new Encoders to CombinePerKey
 discard c33fdda  Catch Exception instead of IOException because some coders to 
not throw Exceptions at all (e.g.VoidCoder)
 discard c8bfcf3  Put Encoders expressions serializable
 discard 72c267c  Wrap exceptions in UserCoderExceptions
 discard c6f2ac9  Apply spotless and checkstyle and add javadocs
 discard 78b2d22  Add an assert of equality in the encoders test
 discard 34e8aa8  Fix generated code: uniform exceptions catching, fix 
parenthesis and variable declarations
 discard f48067b  Fix equal and hashcode
 discard ca01777  Remove example code
 discard 50060a8  Remove lazy init of beam coder because there is no generic 
way on instanciating a beam coder
 discard 0cf2c87  Fix beam coder lazy init using reflexion: use .clas + try 
catch + cast
 discard d7c9a4a  Fix getting the output value in code generation
 discard 8b07ec8  Fix ExpressionEncoder generated code: typos, try catch, fqcn
 discard fdba22d  Fix warning in coder construction by reflexion
 discard e6b68a8  Lazy init coder because coder instance cannot be interpolated 
by catalyst
 discard d5645ff  Fix code generation in Beam coder wrapper
 discard e4478ff  Add a simple spark native test to test Beam coders wrapping 
into Spark Encoders
 discard 2aaf07a  Conform to spark ExpressionEncoders: pass classTags, 
implement scala Product, pass children from within the ExpressionEncoder, fix 
visibilities
 discard fff5092  Fix scala Product in Encoders to avoid StackEverflow
 discard c9e3534  type erasure: spark encoders require a Class<T>, pass Object 
and cast to Class<T>
 discard 5fa6331  Wrap Beam Coders into Spark Encoders using ExpressionEncoder: 
deserialization part
 discard a5c7da3  Wrap Beam Coders into Spark Encoders using ExpressionEncoder: 
serialization part
 discard 20d5bbd  Use "sparkMaster" in local mode to obtain number of shuffle 
partitions + spotless apply
 discard 22d6466  Improve Pardo translation performance: avoid calling a filter 
transform when there is only one output tag
 discard 93d425a  After testing performance and correctness, launch pipeline 
with dataset.foreach(). Make both test mode and production mode use foreach for 
uniformity. Move dataset print as a utility method
 discard 61f487f  Remove no more needed AggregatorCombinerPerKey (there is only 
AggregatorCombiner)
 discard f8a5046  fixup! Add PipelineResults to Spark structured streaming.
 discard 8aafd50  Print number of leaf datasets
 discard ec43374  Add spark execution plans extended debug messages.
 discard 3b15128  Update log4j configuration
 discard cde225a  Add PipelineResults to Spark structured streaming.
 discard 0e36b19  Make spotless happy
 discard 4aaf456  Added metrics sinks and tests
 discard dc939c8  Persist all output Dataset if there are multiple outputs in 
pipeline Enabled Use*Metrics tests
 discard dab3c2e  Add a test to check that CombineGlobally preserves windowing
 discard 6e9ccdd  Fix accumulators initialization in Combine that prevented 
CombineGlobally to work.
 discard a797884  Fix javadoc
 discard 476bc20  Add setEnableSparkMetricSinks() method
 discard 51ca79a  Add missing dependencies to run Spark Structured Streaming 
Runner on Nexmark
 discard 5ed3e03  Add metrics support in DoFn
 discard d29c64e  Ignore for now not working test testCombineGlobally
 discard ff50ccb  Add a test that combine per key preserves windowing
 discard 6638522  Clean groupByKeyTest
 discard 8c499a5  add comment in combine globally test
 discard f68ed7a  Fixed immutable list bug
 discard 9601649  Fix javadoc of AggregatorCombiner
 discard 7784f30  Clean not more needed WindowingHelpers
 discard 3bd95df  Clean not more needed RowHelpers
 discard 4b5af9c  Clean no more needed KVHelpers
 discard 569a9cb  Now that there is only Combine.PerKey translation, make only 
one Aggregator
 discard ad0c179  Remove CombineGlobally translation because it is less 
performant than the beam sdk one (key + combinePerKey.withHotkeyFanout)
 discard 7b6c914  Remove the mapPartition that adds a key per partition because 
otherwise spark will reduce values per key instead of globally
 discard 1589877  Fix bug in the window merging logic
 discard 4e34632  Fix wrong encoder in combineGlobally GBK
 discard f00e7d5  Fix case when a window does not merge into any other window
 discard f36e5c3  Apply a groupByKey avoids for some reason that the spark 
structured streaming fmwk casts data to Row which makes it impossible to 
deserialize without the coder shipped into the data. For performance reasons 
(avoid memory consumption and having to deserialize), we do not ship coder + 
data. Also add a mapparitions before GBK to avoid shuffling
 discard 4f4744a  Revert extractKey while combinePerKey is not done (so that it 
compiles)
 discard 6de9acf  Fix encoder in combine call
 discard 70e3d66  Implement merge accumulators part of CombineGlobally 
translation with windowing
 discard 28ba572  Output data after combine
 discard 960d245  Implement reduce part of CombineGlobally translation with 
windowing
 discard 595d9eb  Fix comment about schemas
 discard edaa37f  Update KVHelpers.extractKey() to deal with WindowedValue and 
update GBK and CPK
 discard 5dc8c24  Add TODO in Combine translations
 discard 14c703f  Add a test that GBK preserves windowing
 discard 28ee71c  Improve visibility of debug messages
 discard 47b7132  re-enable reduceFnRunner timers for output
 discard fed93da  Re-code GroupByKeyTranslatorBatch to conserve windowing 
instead of unwindowing/windowing(GlobalWindow): simplify code, use 
ReduceFnRunner to merge the windows
 discard c23c07e  Add comment about checkpoint mark
 discard a3e29b4  Update windowAssignTest
 discard 68e3ae2  Put back batch/simpleSourceTest.testBoundedSource
 discard cc7a52d  Consider null object case on RowHelpers, fixes empty side 
inputs tests.
 discard a94045c  Pass transform based doFnSchemaInformation in ParDo 
translation
 discard 1be0c8a  Fixes ParDo not calling setup and not tearing down if 
exception on startBundle
 discard 2bfd3d5  Limit the number of partitions to make tests go 300% faster
 discard c2896b6  Enable batch Validates Runner tests for Structured Streaming 
Runner
 discard e4ef07b  Apply Spotless
 discard ab6a879  Update javadoc
 discard fdcd346  implement source.stop
 discard d2e65df  Ignore spark offsets (cf javadoc)
 discard d285dd1  Use PAssert in Spark Structured Streaming transform tests
 discard 9354339  Rename SparkPipelineResult to 
SparkStructuredStreamingPipelineResult This is done to avoid an eventual 
collision with the one in SparkRunner. However this cannot happen at this 
moment because it is package private, so it is also done for consistency.
 discard fc877cd  Add SparkStructuredStreamingPipelineOptions and 
SparkCommonPipelineOptions - SparkStructuredStreamingPipelineOptions was added 
to have the new   runner rely only on its specific options.
 discard 47376e3  Fix logging levels in Spark Structured Streaming translation
 discard 03eb450  Fix spotless issues after rebase
 discard 58f97b8  Pass doFnSchemaInformation to ParDo batch translation
 discard 2628393  Fix non-vendored imports from Spark Streaming Runner classes
 discard cb72394  Merge Spark Structured Streaming runner into main Spark 
module. Remove spark-structured-streaming module.Rename Runner to 
SparkStructuredStreamingRunner. Remove specific PipelineOotions and Registrar 
for Structured Streaming Runner
 discard c9a8c8c  Fix access level issues, typos and modernize code to Java 8 
style
 discard 19e5fdf  Disable never ending test SimpleSourceTest.testUnboundedSource
 discard c68e875  Apply spotless and fix spotbugs warnings
 discard 79e85ec  Deal with checkpoint and offset based read
 discard b7c68bd  Continue impl of offsets for streaming source
 discard afa6a48  Clean streaming source
 discard 8caa982  Clean unneeded 0 arg constructor in batch source
 discard 6e94948  Specify checkpointLocation at the pipeline start
 discard 4527615  Add source streaming test
 discard ce46b9b  Add transformators registry in PipelineTranslatorStreaming
 discard cb5dffa  Add a TODO on spark output modes
 discard 4030fb0  Implement first streaming source
 discard 81c0bbe  Add streaming source initialisation
 discard 2ad1f15  And unchecked warning suppression
 discard cb1a99c  Added TODO comment for ReshuffleTranslatorBatch
 discard 530dfb0  Added using CachedSideInputReader
 discard d8ee03e  Don't use Reshuffle translation
 discard c879337  Fix CheckStyle violations
 discard d759a19  Added SideInput support
 discard 1355ece  Fix javadoc
 discard 41e6a19  Implement WindowAssignTest
 discard bf2af77  Implement WindowAssignTranslatorBatch
 discard 383d58d  Cleaning
 discard 1244549  Fix encoder bug in combinePerkey
 discard ac67ada  Add explanation about receiving a Row as input in the combiner
 discard a2d1975  Use more generic Row instead of GenericRowWithSchema
 discard a72afd8  Fix combine. For unknown reason GenericRowWithSchema is used 
as input of combine so extract its content to be able to proceed
 discard 00acd7d  Update test with Long
 discard b13839d  Fix various type checking issues in Combine.Globally
 discard 3c25348  Get back to classes in translators resolution because URNs 
cannot translate Combine.Globally
 discard 8d0a8b5  Cleaning
 discard 0a88819  Add CombineGlobally translation to avoid translating 
Combine.perKey as a composite transform based on Combine.PerKey (which uses low 
perf GBK)
 discard f48c109  Introduce RowHelpers
 discard 2a1d74e  Add combinePerKey and CombineGlobally tests
 discard 684fc4a  Fix combiner using KV as input, use binary encoders in place 
of accumulatorEncoder and outputEncoder, use helpers, spotless
 discard 72d338b  Introduce WindowingHelpers (and helpers package) and use it 
in Pardo, GBK and CombinePerKey
 discard 4d3be61  Improve type checking of Tuple2 encoder
 discard 2c465f8  First version of combinePerKey
 discard d33508e  Extract binary schema creation in a helper class
 discard bfc10a2  Fix getSideInputs
 discard 4cbdbb8  Generalize the use of SerializablePipelineOptions in place of 
(not serializable) PipelineOptions
 discard 08b580b  Rename SparkDoFnFilterFunction to DoFnFilterFunction for 
consistency
 discard 674a048  Add a test for the most simple possible Combine
 discard 3716673  Added "testTwoPardoInRow"
 discard 2482172  Fix for test elements container in GroupByKeyTest
 discard bc7fab4  Rename SparkOutputManager for consistency
 discard 314d935  Fix kryo issue in GBK translator with a workaround
 discard 3ba4bc6  Simplify logic of ParDo translator
 discard b7a76d9  Don't use deprecated sideInput.getWindowingStrategyInternal()
 discard 6d723b1  Rename SparkSideInputReader class and rename pruneOutput() to 
pruneOutputFilteredByTag()
 discard 9bee53f  Fixed Javadoc error
 discard 2f37994  Apply spotless
 discard 7c5b778  Fix Encoders: create an Encoder for every manipulated type to 
avoid Spark fallback to genericRowWithSchema and cast to allow having Encoders 
with generic types such as WindowedValue<T> and get the type checking back
 discard 1b244e9  Fail in case of having SideInouts or State/Timers
 discard a74d149  Add ComplexSourceTest
 discard 21902b6  Remove no more needed putDatasetRaw
 discard a8e50ad  Port latest changes of ReadSourceTranslatorBatch to 
ReadSourceTranslatorStreaming
 discard 14368ff  Fix type checking with Encoder of WindowedValue<T>
 discard fe7fb4e  Add comments and TODO to GroupByKeyTranslatorBatch
 discard 779e621  Add GroupByKeyTest
 discard fe19d6c  Clean
 discard ae44706  Address minor review notes
 discard d4acf25  Add ParDoTest
 discard f47bb0a  Clean
 discard 1d31831  Fix split bug
 discard e12226a  Remove bundleSize parameter and always use spark default 
parallelism
 discard defd5e6  Cleaning
 discard b4230dc  Fix testMode output to comply with new binary schema
 discard 84bdf70  Fix errorprone
 discard e4d4b4f  Comment schema choices
 discard 675bf94  Serialize windowedValue to byte[] in source to be able to 
specify a binary dataset schema and deserialize windowedValue from Row to get a 
dataset<WindowedValue>
 discard e0c8fbd  First attempt for ParDo primitive implementation
 discard fc54404  Add flatten test
 discard 3aea53b  Enable gradle build scan
 discard 3dc4dc3  Enable test mode
 discard 047af3e  Put all transform translators Serializable
 discard 23ca155  Simplify beam reader creation as it created once the source 
as already been partitioned
 discard e54cbc6  Fix SourceTest
 discard 74693b2  Move SourceTest to same package as tested class
 discard 00f2e11  Add serialization test
 discard 1720e6b  Add SerializationDebugger
 discard 625056e  Fix serialization issues
 discard 0e85242  Clean unneeded fields in DatasetReader
 discard 4936687  improve readability of options passing to the source
 discard 0ffd98d  Fix pipeline triggering: use a spark action instead of 
writing the dataset
 discard 02933bd  Refactor SourceTest to a UTest instaed of a main
 discard bb830be  Checkstyle and Findbugs
 discard 037db6e  Clean
 discard d12cc14  Add empty 0-arg constructor for mock source
 discard 9251dcb  Add a dummy schema for reader
 discard 02458a7  Apply spotless and fix  checkstyle
 discard 101f6f2  Use new PipelineOptionsSerializationUtils
 discard ca5a120  Add missing 0-arg public constructor
 discard 2b64bd2  Wire real SourceTransform and not mock and update the test
 discard ca5f70c  Refactor DatasetSource fields
 discard 9d8dd90  Pass Beam Source and PipelineOptions to the spark DataSource 
as serialized strings
 discard 5b0f9a2  Move Source and translator mocks to a mock package.
 discard 08c05d6  Add ReadSourceTranslatorStreaming
 discard b617ba4  Clean
 discard d6e905b  Use raw Encoder<WindowedValue> also in regular 
ReadSourceTranslatorBatch
 discard 64c8202  Split batch and streaming sources and translators
 discard f9ed0dd  Run pipeline in batch mode or in streaming mode
 discard 4aa321c  Move DatasetSourceMock to proper batch mode
 discard 44644d3  clean deps
 discard cb03179  Use raw WindowedValue so that spark Encoders could work 
(temporary)
 discard bea28b1  fix mock, wire mock in translators and create a main test.
 discard eb2fa49  Add source mocks
 discard b57367f  Experiment over using spark Catalog to pass in Beam Source 
through spark Table
 discard 6795e4c  Improve type enforcement in ReadSourceTranslator
 discard b8cb742  Improve exception flow
 discard 3cb2a76  start source instanciation
 discard 5701c75  Apply spotless
 discard 238e57a  update TODO
 discard 3ae60a4  Implement read transform
 discard b0920e3  Use Iterators.transform() to return Iterable
 discard 1298b77  Add primitive GroupByKeyTranslatorBatch implementation
 discard 9d721a7  Add Flatten transformation translator
 discard 8c1a012  Create Datasets manipulation methods
 discard b458c64  Create PCollections manipulation methods
 discard d75ac4a  Add basic pipeline execution. Refactor translatePipeline() to 
return the translationContext on which we can run startPipeline()
 discard cc07798  Added SparkRunnerRegistrar
 discard cfd3abf  Add precise TODO for multiple TransformTranslator per 
transform URN
 discard 7612d17  Post-pone batch qualifier in all classes names for readability
 discard 1c7d381  Add TODOs
 discard bf5619d  Make codestyle and firebug happy
 discard ff06081  apply spotless
 discard 6ede65c  Move common translation context components to superclass
 discard 8875d48d Move SparkTransformOverrides to correct package
 discard 83f44e0  Improve javadocs
 discard ae2392e  Make transform translation clearer: renaming, comments
 discard 31af0f1  Refactoring: -move batch/streaming common translation visitor 
and utility methods to PipelineTranslator -rename batch dedicated classes to 
Batch* to differentiate with their streaming counterparts -Introduce 
TranslationContext for common batch/streaming components
 discard 8abe116  Initialise BatchTranslationContext
 discard 6f1fda8  Organise methods in PipelineTranslator
 discard 5990104  Renames: better differenciate pipeline translator for 
transform translator
 discard a5b9894  Wire node translators with pipeline translator
 discard 9fea139  Add nodes translators structure
 discard d7df588  Add global pipeline translation structure
 discard ee8875a  Start pipeline translation
 discard 253123c  Add SparkPipelineOptions
 discard 2558c22  Fix missing dep
 discard 7794f02  Add an empty spark-structured-streaming runner project 
targeting spark 2.4.0
     add 14a7a24  [BEAM-8470] Add an empty spark-structured-streaming runner 
project targeting spark 2.4.0
     add 0d314d2  [BEAM-8470] Fix missing dep
     add 44aa39d  [BEAM-8470] Add SparkPipelineOptions
     add 8b1f07e  [BEAM-8470] Start pipeline translation
     add 5b7efb1  [BEAM-8470] Add global pipeline translation structure
     add c3828ea  [BEAM-8470] Add nodes translators structure
     add c4c38b3  [BEAM-8470] Wire node translators with pipeline translator
     add 2222e24  [BEAM-8470] Renames: better differenciate pipeline translator 
for transform translator
     add 1ffbf4e  [BEAM-8470] Organise methods in PipelineTranslator
     add 1c1bbab  [BEAM-8470] Initialise BatchTranslationContext
     add b8d4a96  [BEAM-8470] Refactoring: -move batch/streaming common 
translation visitor and utility methods to PipelineTranslator -rename batch 
dedicated classes to Batch* to differentiate with their streaming counterparts 
-Introduce TranslationContext for common batch/streaming components
     add fef01b3  [BEAM-8470] Make transform translation clearer: renaming, 
comments
     add 666c011  [BEAM-8470] Improve javadocs
     add 5eeca80  [BEAM-8470] Move SparkTransformOverrides to correct package
     add fd888deb [BEAM-8470] Move common translation context components to 
superclass
     add 4e94975  [BEAM-8470] apply spotless
     add 12ed3f5  [BEAM-8470] Make codestyle and firebug happy
     add fbc7fbc  [BEAM-8470] Add TODOs
     add 3f29bff  [BEAM-8470] Post-pone batch qualifier in all classes names 
for readability
     add 79b4541  [BEAM-8470] Add precise TODO for multiple TransformTranslator 
per transform URN
     add 7c8cd47  [BEAM-8470] Added SparkRunnerRegistrar
     add d757185  [BEAM-8470] Add basic pipeline execution. Refactor 
translatePipeline() to return the translationContext on which we can run 
startPipeline()
     add a36bae0  [BEAM-8470] Create PCollections manipulation methods
     add e1b7644  [BEAM-8470] Create Datasets manipulation methods
     add ad26cec  [BEAM-8470] Add Flatten transformation translator
     add 129e95f  [BEAM-8470] Add primitive GroupByKeyTranslatorBatch 
implementation
     add 6965842  [BEAM-8470] Use Iterators.transform() to return Iterable
     add cce30c9  [BEAM-8470] Implement read transform
     add 7901d73  [BEAM-8470] update TODO
     add eb7c77e  [BEAM-8470] Apply spotless
     add cb9bb99  [BEAM-8470] start source instanciation
     add 6bb32a5  [BEAM-8470] Improve exception flow
     add 1970760  [BEAM-8470] Improve type enforcement in ReadSourceTranslator
     add c863dca  [BEAM-8470] Experiment over using spark Catalog to pass in 
Beam Source through spark Table
     add 2a960dd  [BEAM-8470] Add source mocks
     add be8344e  [BEAM-8470] fix mock, wire mock in translators and create a 
main test.
     add 756eec3  [BEAM-8470] Use raw WindowedValue so that spark Encoders 
could work (temporary)
     add 4480578  [BEAM-8470] clean deps
     add 26b2a91  [BEAM-8470] Move DatasetSourceMock to proper batch mode
     add 5385a8f  [BEAM-8470] Run pipeline in batch mode or in streaming mode
     add 8051fac  [BEAM-8470] Split batch and streaming sources and translators
     add ee0ea0e  [BEAM-8470] Use raw Encoder<WindowedValue> also in regular 
ReadSourceTranslatorBatch
     add ae75196  [BEAM-8470] Clean
     add 4a59b09  [BEAM-8470] Add ReadSourceTranslatorStreaming
     add 68f7230  [BEAM-8470] Move Source and translator mocks to a mock 
package.
     add 3368a9e  [BEAM-8470] Pass Beam Source and PipelineOptions to the spark 
DataSource as serialized strings
     add c569f1b  [BEAM-8470] Refactor DatasetSource fields
     add db036ca  [BEAM-8470] Wire real SourceTransform and not mock and update 
the test
     add 6cac8b2  [BEAM-8470] Add missing 0-arg public constructor
     add dc615f0  [BEAM-8470] Use new PipelineOptionsSerializationUtils
     add adfd237  [BEAM-8470] Apply spotless and fix  checkstyle
     add aeee309  [BEAM-8470] Add a dummy schema for reader
     add e9b0488  [BEAM-8470] Add empty 0-arg constructor for mock source
     add 9537add  [BEAM-8470] Clean
     add 8b232df  [BEAM-8470] Checkstyle and Findbugs
     add 344f69e  [BEAM-8470] Refactor SourceTest to a UTest instaed of a main
     add 285aab4  [BEAM-8470] Fix pipeline triggering: use a spark action 
instead of writing the dataset
     add 244459f  [BEAM-8470] improve readability of options passing to the 
source
     add 1102280  [BEAM-8470] Clean unneeded fields in DatasetReader
     add b3796a2  [BEAM-8470] Fix serialization issues
     add 1dc0352  [BEAM-8470] Add SerializationDebugger
     add f5a0ba7  [BEAM-8470] Add serialization test
     add 63aa493  [BEAM-8470] Move SourceTest to same package as tested class
     add baea877  [BEAM-8470] Fix SourceTest
     add bb7db53  [BEAM-8470] Simplify beam reader creation as it created once 
the source as already been partitioned
     add 833df0e  [BEAM-8470] Put all transform translators Serializable
     add 3004db7  [BEAM-8470] Enable test mode
     add 3dc7d95  [BEAM-8470] Enable gradle build scan
     add 068f63d  [BEAM-8470] Add flatten test
     add 5dc598a  [BEAM-8470] First attempt for ParDo primitive implementation
     add 30cbbf4  [BEAM-8470] Serialize windowedValue to byte[] in source to be 
able to specify a binary dataset schema and deserialize windowedValue from Row 
to get a dataset<WindowedValue>
     add 176342f  [BEAM-8470] Comment schema choices
     add 730eed3  [BEAM-8470] Fix errorprone
     add 067e756  [BEAM-8470] Fix testMode output to comply with new binary 
schema
     add 1960559  [BEAM-8470] Cleaning
     add 7e5399e  [BEAM-8470] Remove bundleSize parameter and always use spark 
default parallelism
     add e63d794  [BEAM-8470] Fix split bug
     add fc2239d  [BEAM-8470] Clean
     add f152deb  [BEAM-8470] Add ParDoTest
     add 7fd2de1  [BEAM-8470] Address minor review notes
     add bd385ff  [BEAM-8470] Clean
     add bdeb934  [BEAM-8470] Add GroupByKeyTest
     add 990e220  [BEAM-8470] Add comments and TODO to GroupByKeyTranslatorBatch
     add 997043d  [BEAM-8470] Fix type checking with Encoder of WindowedValue<T>
     add 61bf40c  [BEAM-8470] Port latest changes of ReadSourceTranslatorBatch 
to ReadSourceTranslatorStreaming
     add c8f2078  [BEAM-8470] Remove no more needed putDatasetRaw
     add f7a008c  [BEAM-8470] Add ComplexSourceTest
     add 728dc73  [BEAM-8470] Fail in case of having SideInouts or State/Timers
     add 0cd30d1  [BEAM-8470] Fix Encoders: create an Encoder for every 
manipulated type to avoid Spark fallback to genericRowWithSchema and cast to 
allow having Encoders with generic types such as WindowedValue<T> and get the 
type checking back
     add 2bea926  [BEAM-8470] Apply spotless
     add a048341  [BEAM-8470] Fixed Javadoc error
     add 8286afc  [BEAM-8470] Rename SparkSideInputReader class and rename 
pruneOutput() to pruneOutputFilteredByTag()
     add 313bc66  [BEAM-8470] Don't use deprecated 
sideInput.getWindowingStrategyInternal()
     add 6657d7d  [BEAM-8470] Simplify logic of ParDo translator
     add 359a6b0  [BEAM-8470] Fix kryo issue in GBK translator with a workaround
     add b7ec102  [BEAM-8470] Rename SparkOutputManager for consistency
     add 9275c82  [BEAM-8470] Fix for test elements container in GroupByKeyTest
     add b136699  [BEAM-8470] Added "testTwoPardoInRow"
     add a9eed5b  [BEAM-8470] Add a test for the most simple possible Combine
     add 222ce45  [BEAM-8470] Rename SparkDoFnFilterFunction to 
DoFnFilterFunction for consistency
     add 65dcd0b  [BEAM-8470] Generalize the use of SerializablePipelineOptions 
in place of (not serializable) PipelineOptions
     add d24bc86  [BEAM-8470] Fix getSideInputs
     add 126ee79  [BEAM-8470] Extract binary schema creation in a helper class
     add cb666d3  [BEAM-8470] First version of combinePerKey
     add e282801  [BEAM-8470] Improve type checking of Tuple2 encoder
     add f277be2  [BEAM-8470] Introduce WindowingHelpers (and helpers package) 
and use it in Pardo, GBK and CombinePerKey
     add 35f7128  [BEAM-8470] Fix combiner using KV as input, use binary 
encoders in place of accumulatorEncoder and outputEncoder, use helpers, spotless
     add ca25210  [BEAM-8470] Add combinePerKey and CombineGlobally tests
     add f416e20  [BEAM-8470] Introduce RowHelpers
     add 4f670a5  [BEAM-8470] Add CombineGlobally translation to avoid 
translating Combine.perKey as a composite transform based on Combine.PerKey 
(which uses low perf GBK)
     add 071aab5  [BEAM-8470] Cleaning
     add 2aff552  [BEAM-8470] Get back to classes in translators resolution 
because URNs cannot translate Combine.Globally
     add cd19577  [BEAM-8470] Fix various type checking issues in 
Combine.Globally
     add 3499784  [BEAM-8470] Update test with Long
     add f94f5ca  [BEAM-8470] Fix combine. For unknown reason 
GenericRowWithSchema is used as input of combine so extract its content to be 
able to proceed
     add 4bb19ce  [BEAM-8470] Use more generic Row instead of 
GenericRowWithSchema
     add c3b4686  [BEAM-8470] Add explanation about receiving a Row as input in 
the combiner
     add 82131f5  [BEAM-8470] Fix encoder bug in combinePerkey
     add a781ec1  [BEAM-8470] Cleaning
     add 0c53291  [BEAM-8470] Implement WindowAssignTranslatorBatch
     add 75dc161  [BEAM-8470] Implement WindowAssignTest
     add cb0e79a  [BEAM-8470] Fix javadoc
     add 95564cd  [BEAM-8470] Added SideInput support
     add d51e934  [BEAM-8470] Fix CheckStyle violations
     add 0f1c9ff  [BEAM-8470] Don't use Reshuffle translation
     add c4b8e86  [BEAM-8470] Added using CachedSideInputReader
     add 4c10a48  [BEAM-8470] Added TODO comment for ReshuffleTranslatorBatch
     add ff2ed77  [BEAM-8470] And unchecked warning suppression
     add f5cbbda  [BEAM-8470] Add streaming source initialisation
     add f20a878  [BEAM-8470] Implement first streaming source
     add 0863bf5  [BEAM-8470] Add a TODO on spark output modes
     add 2470f3e  [BEAM-8470] Add transformators registry in 
PipelineTranslatorStreaming
     add 4ab7b03  [BEAM-8470] Add source streaming test
     add 9f2caa8  [BEAM-8470] Specify checkpointLocation at the pipeline start
     add 79f22a8  [BEAM-8470] Clean unneeded 0 arg constructor in batch source
     add 850f56a  [BEAM-8470] Clean streaming source
     add e260811  [BEAM-8470] Continue impl of offsets for streaming source
     add 45cd090  [BEAM-8470] Deal with checkpoint and offset based read
     add 2070278  [BEAM-8470] Apply spotless and fix spotbugs warnings
     add 52e8689  [BEAM-8470] Disable never ending test 
SimpleSourceTest.testUnboundedSource
     add 673731f  [BEAM-8470] Fix access level issues, typos and modernize code 
to Java 8 style
     add f106a04  [BEAM-8470] Merge Spark Structured Streaming runner into main 
Spark module. Remove spark-structured-streaming module.Rename Runner to 
SparkStructuredStreamingRunner. Remove specific PipelineOotions and Registrar 
for Structured Streaming Runner
     add 1c82b5e  [BEAM-8470] Fix non-vendored imports from Spark Streaming 
Runner classes
     add 53acbda  [BEAM-8470] Pass doFnSchemaInformation to ParDo batch 
translation
     add 102f6fc  [BEAM-8470] Fix spotless issues after rebase
     add fecc40b  [BEAM-8470] Fix logging levels in Spark Structured Streaming 
translation
     add f2dd748  [BEAM-8470] Add SparkStructuredStreamingPipelineOptions and 
SparkCommonPipelineOptions - SparkStructuredStreamingPipelineOptions was added 
to have the new   runner rely only on its specific options.
     add 5e4d878  [BEAM-8470] Rename SparkPipelineResult to 
SparkStructuredStreamingPipelineResult This is done to avoid an eventual 
collision with the one in SparkRunner. However this cannot happen at this 
moment because it is package private, so it is also done for consistency.
     add c9b3e51  [BEAM-8470] Use PAssert in Spark Structured Streaming 
transform tests
     add cbe80d4  [BEAM-8470] Ignore spark offsets (cf javadoc)
     add 2427002  [BEAM-8470] implement source.stop
     add 0031ef9  [BEAM-8470] Update javadoc
     add 0c082b89 [BEAM-8470] Apply Spotless
     add 9b7986a  [BEAM-8470] Enable batch Validates Runner tests for 
Structured Streaming Runner
     add 7891116  [BEAM-8470] Limit the number of partitions to make tests go 
300% faster
     add 48eaf9d  [BEAM-8470] Fixes ParDo not calling setup and not tearing 
down if exception on startBundle
     add 386080c  [BEAM-8470] Pass transform based doFnSchemaInformation in 
ParDo translation
     add f780659  [BEAM-8470] Consider null object case on RowHelpers, fixes 
empty side inputs tests.
     add 355e95d  [BEAM-8470] Put back batch/simpleSourceTest.testBoundedSource
     add 2c49efc  [BEAM-8470] Update windowAssignTest
     add 0705e50  [BEAM-8470] Add comment about checkpoint mark
     add 211cb98  [BEAM-8470] Re-code GroupByKeyTranslatorBatch to conserve 
windowing instead of unwindowing/windowing(GlobalWindow): simplify code, use 
ReduceFnRunner to merge the windows
     add d52ef97  [BEAM-8470] re-enable reduceFnRunner timers for output
     add 26fcb28  [BEAM-8470] Improve visibility of debug messages
     add c5970f2  [BEAM-8470] Add a test that GBK preserves windowing
     add 22abb05  [BEAM-8470] Add TODO in Combine translations
     add dc4dda7  [BEAM-8470] Update KVHelpers.extractKey() to deal with 
WindowedValue and update GBK and CPK
     add 09bc7a1  [BEAM-8470] Fix comment about schemas
     add 4c2585b  [BEAM-8470] Implement reduce part of CombineGlobally 
translation with windowing
     add 7222d49  [BEAM-8470] Output data after combine
     add a8ca39f  [BEAM-8470] Implement merge accumulators part of 
CombineGlobally translation with windowing
     add 0bdc83c  [BEAM-8470] Fix encoder in combine call
     add 7237dca  [BEAM-8470] Revert extractKey while combinePerKey is not done 
(so that it compiles)
     add a460052  [BEAM-8470] Apply a groupByKey avoids for some reason that 
the spark structured streaming fmwk casts data to Row which makes it impossible 
to deserialize without the coder shipped into the data. For performance reasons 
(avoid memory consumption and having to deserialize), we do not ship coder + 
data. Also add a mapparitions before GBK to avoid shuffling
     add 26f778e  [BEAM-8470] Fix case when a window does not merge into any 
other window
     add 6f47514  [BEAM-8470] Fix wrong encoder in combineGlobally GBK
     add d093cad  [BEAM-8470] Fix bug in the window merging logic
     add 5427856  [BEAM-8470] Remove the mapPartition that adds a key per 
partition because otherwise spark will reduce values per key instead of globally
     add c0e52c2  [BEAM-8470] Remove CombineGlobally translation because it is 
less performant than the beam sdk one (key + combinePerKey.withHotkeyFanout)
     add e7d3283  [BEAM-8470] Now that there is only Combine.PerKey 
translation, make only one Aggregator
     add 5a7bbf5  [BEAM-8470] Clean no more needed KVHelpers
     add 96b147e  [BEAM-8470] Clean not more needed RowHelpers
     add d6683ad  [BEAM-8470] Clean not more needed WindowingHelpers
     add b19b792  [BEAM-8470] Fix javadoc of AggregatorCombiner
     add c178172  [BEAM-8470] Fixed immutable list bug
     add 98bc7a4  [BEAM-8470] add comment in combine globally test
     add 85fe2d0  [BEAM-8470] Clean groupByKeyTest
     add e205425  [BEAM-8470] Add a test that combine per key preserves 
windowing
     add b12e878  [BEAM-8470] Ignore for now not working test 
testCombineGlobally
     add 8215482  [BEAM-8470] Add metrics support in DoFn
     add 35fef0f  [BEAM-8470] Add missing dependencies to run Spark Structured 
Streaming Runner on Nexmark
     add dc1abf5  [BEAM-8470] Add setEnableSparkMetricSinks() method
     add a7f04ed  [BEAM-8470] Fix javadoc
     add a6d0e58  [BEAM-8470] Fix accumulators initialization in Combine that 
prevented CombineGlobally to work.
     add 37dcb9a  [BEAM-8470] Add a test to check that CombineGlobally 
preserves windowing
     add b59d8c5  [BEAM-8470] Persist all output Dataset if there are multiple 
outputs in pipeline Enabled Use*Metrics tests
     add a004a56  [BEAM-8470] Added metrics sinks and tests
     add 745ab6e  [BEAM-8470] Make spotless happy
     add 3391f8d  [BEAM-8470] Add PipelineResults to Spark structured streaming.
     add 7d7503a  [BEAM-8470] Update log4j configuration
     add 8c91c11  [BEAM-8470] Add spark execution plans extended debug messages.
     add 7f71572  [BEAM-8470] Print number of leaf datasets
     add 65c8457  [BEAM-8470] fixup! Add PipelineResults to Spark structured 
streaming.
     add f2388dc  [BEAM-8470] Remove no more needed AggregatorCombinerPerKey 
(there is only AggregatorCombiner)
     add 134ee35  [BEAM-8470] After testing performance and correctness, launch 
pipeline with dataset.foreach(). Make both test mode and production mode use 
foreach for uniformity. Move dataset print as a utility method
     add a8e0ad9  [BEAM-8470] Improve Pardo translation performance: avoid 
calling a filter transform when there is only one output tag
     add c5b8799  [BEAM-8470] Use "sparkMaster" in local mode to obtain number 
of shuffle partitions + spotless apply
     add 5ce206b  [BEAM-8470] Wrap Beam Coders into Spark Encoders using 
ExpressionEncoder: serialization part
     add 750037f  [BEAM-8470] Wrap Beam Coders into Spark Encoders using 
ExpressionEncoder: deserialization part
     add 69aebb1  [BEAM-8470] type erasure: spark encoders require a Class<T>, 
pass Object and cast to Class<T>
     add 02a0f01  [BEAM-8470] Fix scala Product in Encoders to avoid 
StackEverflow
     add a0706d7  [BEAM-8470] Conform to spark ExpressionEncoders: pass 
classTags, implement scala Product, pass children from within the 
ExpressionEncoder, fix visibilities
     add 3a87d37  [BEAM-8470] Add a simple spark native test to test Beam 
coders wrapping into Spark Encoders
     add cca4034  [BEAM-8470] Fix code generation in Beam coder wrapper
     add 8e341a1  [BEAM-8470] Lazy init coder because coder instance cannot be 
interpolated by catalyst
     add f1a555e  [BEAM-8470] Fix warning in coder construction by reflexion
     add 482bcf6  [BEAM-8470] Fix ExpressionEncoder generated code: typos, try 
catch, fqcn
     add 44274ca  [BEAM-8470] Fix getting the output value in code generation
     add e2a134c  [BEAM-8470] Fix beam coder lazy init using reflexion: use 
.clas + try catch + cast
     add cfff6f7  [BEAM-8470] Remove lazy init of beam coder because there is 
no generic way on instanciating a beam coder
     add 72ba1ea  [BEAM-8470] Remove example code
     add 83421c2  [BEAM-8470] Fix equal and hashcode
     add dc5b243  [BEAM-8470] Fix generated code: uniform exceptions catching, 
fix parenthesis and variable declarations
     add 6b31168  [BEAM-8470] Add an assert of equality in the encoders test
     add ba33722  [BEAM-8470] Apply spotless and checkstyle and add javadocs
     add 4055d11  [BEAM-8470] Wrap exceptions in UserCoderExceptions
     add 2daed76  [BEAM-8470] Put Encoders expressions serializable
     add e8463ce  [BEAM-8470] Catch Exception instead of IOException because 
some coders to not throw Exceptions at all (e.g.VoidCoder)
     add 5ebc1e9  [BEAM-8470] Apply new Encoders to CombinePerKey
     add 7e5b6df  [BEAM-8470] Apply new Encoders to Read source
     add cb82036  [BEAM-8470] Improve performance of source: the mapper already 
calls windowedValueCoder.decode, no need to call it also in the Spark encoder
     add 163b157  [BEAM-8470] Ignore long time failing test: 
SparkMetricsSinkTest
     add a7629eb  [BEAM-8470] Apply new Encoders to Window assign translation
     add 54fd760  [BEAM-8470] Apply new Encoders to AggregatorCombiner
     add f1f2163  [BEAM-8470] Create a Tuple2Coder to encode scala tuple2
     add fbda17a  [BEAM-8470] Apply new Encoders to GroupByKey
     add de92516  [BEAM-8470] Apply new Encoders to Pardo. Replace Tuple2Coder 
with MultiOutputCoder to deal with multiple output to use in Spark Encoder for 
DoFnRunner
     add f180b4d  [BEAM-8470] Apply spotless, fix typo and javadoc
     add fbb8355  [BEAM-8470] Use beam encoders also in the output of the 
source translation
     add 2e1ed4b  [BEAM-8470] Remove unneeded cast
     add 9e7f118  [BEAM-8470] Fix: Remove generic hack of using object. Use 
actual Coder encodedType in Encoders
     add a0e6ca4  [BEAM-8470] Remove Encoders based on kryo now that we call 
Beam coders in the runner

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (620a27a)
            \
             N -- N -- N   refs/heads/spark-runner_structured-streaming 
(a0e6ca4)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:

[beam] branch spark-runner_structured-streaming updated (620a27a -> a0e6ca4)

Reply via email to