[ https://issues.apache.org/jira/browse/BEAM-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16004106#comment-16004106 ]
ASF GitHub Bot commented on BEAM-1438: -------------------------------------- GitHub user reuvenlax reopened a pull request: https://github.com/apache/beam/pull/1952 BEAM-1438 Auto shard streaming sinks If a Write requests runner-determined sharding, per-bundle sharding is the default but performs poorly in Dataflow's streaming runner. Instead, the runner statically picks a sharding based on the number of workers. You can merge this pull request into a Git repository by running: $ git pull https://github.com/reuvenlax/incubator-beam streaming_auto_shard_write Alternatively you can review and apply these changes as the patch at: https://github.com/apache/beam/pull/1952.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1952 ---- commit f4dfbb206382d3ea73881727aa8b0f74eaf98ef4 Author: Kenneth Knowles <k...@google.com> Date: 2017-05-03T02:31:22Z Annotate internal methods of PCollection commit c1b26a1b53c334ab171fad60501ba67593fde5d2 Author: Kenneth Knowles <k...@google.com> Date: 2017-05-03T02:48:38Z Annotate internal pieces of sdks.transforms commit 49cf433c5c08f3cc91512aa9544a36a5d3e84333 Author: Kenneth Knowles <k...@google.com> Date: 2017-05-03T02:59:32Z Tighten access control and internal annotations for triggers commit 9b8a4e5c4b876d4459c64a9bffee613aeae72fb2 Author: Kenneth Knowles <k...@google.com> Date: 2017-05-03T03:05:34Z The transforms.reflect package is not for users commit fe51cc0d1a8aa14adbee81b220f9ca8a442f26fe Author: Kenneth Knowles <k...@google.com> Date: 2017-05-03T03:05:45Z Annotate internal-only bits of Java sdk.runners commit 58298d866fe9d1f4fcaf2ccda3078809f4d55b27 Author: Kenneth Knowles <k...@google.com> Date: 2017-05-03T17:10:07Z Tighten access in sdk.options commit 362d0be79222ad67f1639d54434c1505ef76752b Author: Kenneth Knowles <k...@google.com> Date: 2017-05-03T17:13:15Z Annotate internal methods on Pipeline commit f43b61af4d5a3ee77a610d8b11ef80d421c34501 Author: Kenneth Knowles <k...@google.com> Date: 2017-05-04T13:10:45Z This closes #2852: Tighten up access and use internal annotations a bit in the Java SDK Annotate internal methods on Pipeline Tighten access in sdk.options Annotate internal-only bits of Java sdk.runners The transforms.reflect package is not for users Tighten access control and internal annotations for triggers Annotate internal pieces of sdks.transforms Annotate internal methods of PCollection commit 1f1c897264ea7ab050c8644344f6e2648af9ae4a Author: Luke Cwik <lc...@google.com> Date: 2017-05-04T00:17:11Z [BEAM-2165] Update Apex to support serializing/deserializing custom user types configured via Jackson modules commit 02b72d6644c07b72a4c977a6cb16d59ec5a0ed8c Author: Luke Cwik <lc...@google.com> Date: 2017-05-04T14:16:29Z [BEAM-2165] Update Apex to support serializing/deserializing custom user types configured via Jackson modules This closes #2880 commit e5729b58330a05e7be510710d0027c004704946b Author: Luke Cwik <lc...@google.com> Date: 2017-05-04T00:19:00Z [BEAM-2165] Update Dataflow to support serializing/deserializing custom user types configured via Jackson modules This also updates the runner harness and existing tests to use a properly constructed ObjectMapper for PipelineOptions. commit 749b33f0b74a9bcd3daf03ea7f9b4579baec2651 Author: Luke Cwik <lc...@google.com> Date: 2017-05-04T14:27:17Z [BEAM-2165] Update Dataflow to support serializing/deserializing custom user types configured via Jackson modules This closes #2881 commit f53e5d43d58c79ab9f3d04e112e6f05ad9dfe42f Author: Luke Cwik <lc...@google.com> Date: 2017-05-04T00:12:20Z [BEAM-2165] Update Flink to support serializing/deserializing custom user types configured via Jackson modules commit 3c5891b31d8dbeafad0a6ffbea33afb92c01c374 Author: Luke Cwik <lc...@google.com> Date: 2017-05-04T14:29:28Z [BEAM-2165] Update Flink to support serializing/deserializing custom user types configured via Jackson modules This closes #2879 commit cc654f02e8670ea789aee67508c569e7547ef11f Author: Luke Cwik <lc...@google.com> Date: 2017-05-03T20:48:07Z [BEAM-1871] Migrate ReleaseInfo away from Google API client GenericJson commit 98e92a0b8a4655a05fce4ae699f5bb93fe74f1de Author: Luke Cwik <lc...@google.com> Date: 2017-05-04T14:41:15Z [BEAM-1871] Migrate ReleaseInfo away from Google API client GenericJson This closes #2868 commit 8a2dcdb6f9d4839c864a2c46c4b5254d0c7d4760 Author: Dan Halperin <dhalp...@google.com> Date: 2017-05-03T18:52:02Z DataflowRunner: integration test GCP-IO Triggered under `-DskipITs=false -Pdataflow-runner` commit e1d4aa96338959a556c8b815ccb6b1aae118ad15 Author: Dan Halperin <dhalp...@google.com> Date: 2017-05-04T14:59:38Z This closes #2870 commit 1671708340fb9fc57cdc91c3bbacdff3ae6af4af Author: yangping.wu <yangping...@qunar.com> Date: 2017-05-04T06:04:08Z [BEAM-1491]Identify HADOOP_CONF_DIR(or YARN_CONF_DIR) environment variables commit 588f57a1e6771883df84d06087a93fa4fc4baa54 Author: Luke Cwik <lc...@google.com> Date: 2017-05-04T15:48:23Z [BEAM-1491]Identify HADOOP_CONF_DIR(or YARN_CONF_DIR) environment variables This closes #2890 commit fba3d87ffec08f84c8be08ee16942b13364da2d9 Author: Robert Bradshaw <rober...@google.com> Date: 2017-05-03T21:56:37Z Split Coder's encode/decode methods into two methods depending on context. This allows the outer context to be marked deprecated. A follow-up PR will remove the old method once all consumers have been updated. commit d9293007d065c82111bf449502b5466042dc6335 Author: Luke Cwik <lc...@google.com> Date: 2017-05-04T15:59:05Z [BEAM-2166] Split Coder's encode/decode methods into two methods depending on context. This closes #2871 commit 690ec3b1f7b6ce9caaa7b9e401878e136f44bc50 Author: bchambers <bchamb...@google.com> Date: 2017-05-03T23:40:09Z [BEAM-2162] Add logging to long BigQuery jobs commit ade5cbea605b99ebb6e566491ec64e12fc1a663d Author: Dan Halperin <dhalp...@google.com> Date: 2017-05-04T16:00:36Z This closes #2882 commit 17ad1efe7355b238efb5e341487a8e22660b3b77 Author: Borisa Zivkovic <borisa.zivko...@huawei.com> Date: 2017-05-03T15:22:18Z Use BinaryCombineLongFn in GroupIntoBatches commit d1afdd8e14b0a62368e0573ffbaffeac14997e2e Author: Thomas Groh <tg...@google.com> Date: 2017-05-04T16:14:42Z This closes #2859 commit 70dad36f099ea0b454e2900302f7e7f866579f79 Author: Sourabh Bajaj <sourabhba...@google.com> Date: 2017-05-03T20:50:46Z [BEAM-2152] Remove gcloud auth as application default credentials does it commit 93020941a251bb62fc26f5e123a12df4f8e4ab1e Author: Ahmet Altay <al...@google.com> Date: 2017-05-04T16:27:43Z This closes #2869 commit c102d277e22cef8001c0f78d3a5ed00000e8d99d Author: Dan Halperin <dhalp...@google.com> Date: 2017-05-04T00:50:20Z AvroIOTest: stop using IOChannelUtils, remove invalid test commit e5a38ed2610b8ef72192e5a1b9a5630578300164 Author: Dan Halperin <dhalp...@google.com> Date: 2017-05-04T00:55:32Z DataflowRunner: switch from IOChannels to FileSystems for creating files ---- > The default behavior for the Write transform doesn't work well with the > Dataflow streaming runner > ------------------------------------------------------------------------------------------------- > > Key: BEAM-1438 > URL: https://issues.apache.org/jira/browse/BEAM-1438 > Project: Beam > Issue Type: Bug > Components: runner-dataflow > Reporter: Reuven Lax > Assignee: Reuven Lax > > If a Write specifies 0 output shards, that implies the runner should pick an > appropriate sharding. The default behavior is to write one shard per input > bundle. This works well with the Dataflow batch runner, but not with the > streaming runner which produces large numbers of small bundles. -- This message was sent by Atlassian JIRA (v6.3.15#6346)