travis-cook-sfdc commented on issue #10611:
URL: https://github.com/apache/pinot/issues/10611#issuecomment-1507851543

   I spent a little bit more time with this and now understand why 4️⃣ was an 
issue.
   
   ```java
   public class FileTest {
   
       public static void matches(Path path, String glob){
           PathMatcher matcher = FileSystems.getDefault().getPathMatcher(glob);
           System.out.println(matcher.matches(path));
       }
       public static void main(String[] args) throws IOException {
           Path path = 
Paths.get("s3://redactedCompanyName/metrics_rollup_dev/redactedTableName/v/4/ds=2023-03-02/part-00000-d60ed2b8-30cd-4e7c-82e0-309f854991f5.c000.gz.parquet");
           System.out.println(path.toString());
           matches(path, 
"regex:^s3://redactedCompanyName/metrics_rollup_dev/redactedTableName/v/4/ds=(2023-03-02)/.*[.]parquet$");
           matches(path, 
"regex:^s3:/redactedCompanyName/metrics_rollup_dev/redactedTableName/v/4/ds=(2023-03-02)/.*[.]parquet$");
       }
   }
   
   FileTest.main(new String[] {})
   
   ```
   
   Returns
   ```
   
s3:/redactedCompanyName/metrics_rollup_dev/redactedTableName/v/4/ds=2023-03-02/part-00000-d60ed2b8-30cd-4e7c-82e0-309f854991f5.c000.gz.parquet
   false
   true
   ```
   
   Because Pinot regex matches on the Java Path object using `getPathMatcher`, 
and java path's convert `//` to `/`, it's critical that the regex matches that 
are sent for ingestion are aware of that fact.
   
   I think it would be useful to clean up the documentation significantly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to