AlexanderKM opened a new pull request, #16635: URL: https://github.com/apache/pinot/pull/16635
## Problem: When using the new `createMetadataTarGz` option on the segment creation job, the hadoop mapper [creates a tar ball](https://github.com/apache/pinot/blob/master/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-hadoop/src/main/java/org/apache/pinot/plugin/ingestion/batch/hadoop/HadoopSegmentCreationMapper.java#L192-L207) of the `metadata.properties` file and the `creation.meta` files in a folder that looks like this with the 2 files right at the top: tar ball name: "segment_name_123.metadata.tar.gz" ``` / metadata.properties creation.meta ``` Note: without this new config option, the whole segment tar ball looks something like: name: "segment_name_123.tar.gz" ``` / /v3 creation.meta index_map columns.psf metadata.properties ``` ^note the `v3/` sub directory **The problem is** on the controller side, it does not distinguish whether this tar ball is of the new or old format, and _expects_ the `v3/` sub directory. See code: - the [DefaultMetadataExtractor](https://github.com/apache/pinot/blob/master/pinot-core/src/main/java/org/apache/pinot/core/metadata/DefaultMetadataExtractor.java#L38) expects the first file to be the index directory - This breaks down later in the [SegmentDirectoryPaths](https://github.com/apache/pinot/blob/master/pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/store/SegmentDirectoryPaths.java#L65) util that expects the first file to be a directory as well: [see code](https://github.com/apache/pinot/blob/master/pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/store/SegmentDirectoryPaths.java#L169) As a result, we end up seeing an error like `"Path: /tmp/xyz/metadata.properties is not a directory"` ## The fix This puts in a quick fix when extracting metadata to allow a tar ball with either the `v3/` sub directory, or just the slim tar ball with only the `metadata.properties` and `creation.meta` at the top level. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
