travis-cook-sfdc commented on issue #10611:
URL: https://github.com/apache/pinot/issues/10611#issuecomment-1507851543
I spent a little bit more time with this and now understand why 4️⃣ was an
issue.
```java
public class FileTest {
public static void matches(Path path, String glob){
PathMatcher matcher = FileSystems.getDefault().getPathMatcher(glob);
System.out.println(matcher.matches(path));
}
public static void main(String[] args) throws IOException {
Path path =
Paths.get("s3://redactedCompanyName/metrics_rollup_dev/redactedTableName/v/4/ds=2023-03-02/part-00000-d60ed2b8-30cd-4e7c-82e0-309f854991f5.c000.gz.parquet");
System.out.println(path.toString());
matches(path,
"regex:^s3://redactedCompanyName/metrics_rollup_dev/redactedTableName/v/4/ds=(2023-03-02)/.*[.]parquet$");
matches(path,
"regex:^s3:/redactedCompanyName/metrics_rollup_dev/redactedTableName/v/4/ds=(2023-03-02)/.*[.]parquet$");
}
}
FileTest.main(new String[] {})
```
Returns
```
s3:/redactedCompanyName/metrics_rollup_dev/redactedTableName/v/4/ds=2023-03-02/part-00000-d60ed2b8-30cd-4e7c-82e0-309f854991f5.c000.gz.parquet
false
true
```
Because Pinot regex matches on the Java Path object using `getPathMatcher`,
and java path's convert `//` to `/`, it's critical that the regex matches that
are sent for ingestion are aware of that fact.
I think it would be useful to clean up the documentation significantly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]