nicoloboschi opened a new pull request #14805: URL: https://github.com/apache/pulsar/pull/14805
### Motivation OpenSearch high-level rest api client does not support Elastic 8 servers. There are some hardcoded request fields that are no longer supported in ES 8 and the server throws an error. ([Full guide](https://www.elastic.co/guide/en/elasticsearch/reference/current/migrating-8.0.html)). For the ES Sink usage the first problem I found was about createIndex request for the field "type" (`include_type_name` in request, `type` in responses). Elastic has a guide to softly migrate to newer installation using [custom http headers](https://www.elastic.co/guide/en/elasticsearch/reference/current/rest-api-compatibility.html). We are using OpenSearch client which doesn't support `"application/vnd.elasticsearch+json;compatible-with=7"` content-type. The best solution is to migrate from high-level rest client to the official [Elastic java client](https://www.elastic.co/guide/en/elasticsearch/client/java-api-client/current/index.html) which is Apache 2 licensed. See the [License page](https://www.elastic.co/pricing/faq/licensing) >Update: The Java HLRC has been deprecated in 7.15.0 in favor of the Java API Client. The Java API Client is licensed under Apache 2.0. Elastic java-client is not compatible with OpenSearch. The solution is to keep both the client (opensearch high-level and elastic java-client) and use the proper one based on the target server. ### Modifications * Added a new configuration property: `compatibilityMode` that accepts: * AUTO (default): it will discover the server version and choose the right client implementation * ELASTICSEARCH: Force to use the ES java client * ELASTICSEARCH_7: Force to use OpenSearch client. It's better to use the OpenSearch implementation for ES7 since it is more "tested" in production and it is the current implementation. (that's a conservative choice to avoid regression while upgrading Pulsar) * OPENSEARCH: Force to use OpenSearch client * Created a new interface `RestClient` with the two different implementations. * Created a new BulkProcessor API (very similar to the HighLevel rest API client one) to handle bulk requests with the following features: * Multi-thread async requests * Threshold based on number of operations * Threshold based on byte sizes of operations * Periodic auto flush * Moved all the tests (both unit and integration) that are using the container to run with both ES 7,ES 8 and OpenSearch docker containers. ### Verifying this change - [x] Make sure that the change passes the CI checks. This change is already covered by existing tests, such as *(please describe tests)*. * tests under `pulsar-io/elastic-search` * Pulsar Sink integration tests ### Documentation - [x] `doc` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
