AlexanderKM opened a new pull request, #16635:
URL: https://github.com/apache/pinot/pull/16635

   
   ## Problem:
   
   When using the new `createMetadataTarGz` option on the segment creation job, 
the hadoop mapper [creates a tar 
ball](https://github.com/apache/pinot/blob/master/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-hadoop/src/main/java/org/apache/pinot/plugin/ingestion/batch/hadoop/HadoopSegmentCreationMapper.java#L192-L207)
 of the `metadata.properties` file and the `creation.meta` files in a folder 
that looks like this with the 2 files right at the top:
   
   tar ball name: "segment_name_123.metadata.tar.gz"
   ```
   /
      metadata.properties
      creation.meta
   ```
      
   Note: without this new config option, the whole segment tar ball looks 
something like:
   name: "segment_name_123.tar.gz"
   ```
   /
     /v3
       creation.meta
       index_map
       columns.psf
       metadata.properties
   ```
   ^note the `v3/` sub directory
   
   **The problem is** on the controller side, it does not distinguish whether 
this tar ball is of the new or old format, and _expects_ the `v3/` sub 
directory.
   
   See code:
   - the 
[DefaultMetadataExtractor](https://github.com/apache/pinot/blob/master/pinot-core/src/main/java/org/apache/pinot/core/metadata/DefaultMetadataExtractor.java#L38)
 expects the first file to be the index directory
   - This breaks down later in the 
[SegmentDirectoryPaths](https://github.com/apache/pinot/blob/master/pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/store/SegmentDirectoryPaths.java#L65)
 util that expects the first file to be a directory as well:  [see 
code](https://github.com/apache/pinot/blob/master/pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/store/SegmentDirectoryPaths.java#L169)
   
   As a result, we end up seeing an error like `"Path: 
/tmp/xyz/metadata.properties is not a directory"`
   
   ## The fix
   
   This puts in a quick fix when extracting metadata to allow a tar ball with 
either the `v3/` sub directory, or just the slim tar ball with only the 
`metadata.properties` and `creation.meta` at the top level.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to