[
https://issues.apache.org/jira/browse/BEAM-12601?focusedWorklogId=633804&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-633804
]
ASF GitHub Bot logged work on BEAM-12601:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 04/Aug/21 20:24
Start Date: 04/Aug/21 20:24
Worklog Time Spent: 10m
Work Description: echauchot commented on pull request #15257:
URL: https://github.com/apache/beam/pull/15257#issuecomment-892949942
> I wonder if we should move towards (please don't hate me for suggesting it
Etienne ) pre-commit unit tests that don't make use of ES itself but analyze
the resulting in-memory PCollection contents to ensure that what's been
produced is as expected. We could still employ post-commit/regression tests
that use a real ES instance, but this could de-flake the pre-commit unit tests?
Don't worry Evan, I do agree, ES tests have been flaky for years because of
embedded ES being sensitive to load. We tried to lower flakiness with test
containers (thanks for your work on that) but there is still. Flaky Utests are
painful for the build so they are painful for the whole dev process. So now
comes the time to set a limit with which we're confident in UTests to spot all
misbehavior and leave the rest to ITests. Only, in that case, these IO ITests
need to run as part of each PR, which is not done right now: e.g. CassandraioIT
and ESIOIT are run on an on-demand basis for load tests mainly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 633804)
Time Spent: 3h 10m (was: 3h)
> Support append-only indices in ES output
> -----------------------------------------
>
> Key: BEAM-12601
> URL: https://issues.apache.org/jira/browse/BEAM-12601
> Project: Beam
> Issue Type: Improvement
> Components: io-java-elasticsearch
> Reporter: Andres Rodriguez
> Priority: P2
> Time Spent: 3h 10m
> Remaining Estimate: 0h
>
> Currently, the Apache Beam Elasticsearch sink is
> [using|https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java#L1532]
> the
> [index|https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html#bulk-api-request-body]
> bulk API operation to add data to the target index. When using append-only
> indices it is better to use the
> [create|https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html#bulk-api-request-body]
> operation. This also applies to new append-only indexes, like [data
> streams|https://www.elastic.co/guide/en/elasticsearch/reference/7.x/use-a-data-stream.html#add-documents-to-a-data-stream].
> The scope of this improvement is to add a new boolean configuration option,
> {{append-only}}, to the Elasticsearch sink, with a default value of {{false}}
> (to keep the current behaivour) that when enabled makes it use the {{create}}
> operation instead of the {{index}} one when sending data.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)