Prathamesh9284 commented on code in PR #692:
URL: https://github.com/apache/wayang/pull/692#discussion_r2837476392
##########
wayang-api/wayang-api-sql/src/main/java/org/apache/wayang/api/sql/sources/fs/JavaCSVTableSource.java:
##########
@@ -176,6 +182,50 @@ private static Stream<String> streamLines(final String
path) {
}
+ /**
+ * Validates the CSV header for Calcite compatibility.
+ * Checks that the header is present, uses comma separators (not the data
+ * delimiter), and each column follows the 'name:type' format
+ * (e.g., 'id:int,name:string,email:string'). Note that Calcite hardcodes
+ * commas for header parsing, while data rows use Wayang's configurable
+ * separator (default ';').
+ *
+ * @param path the filesystem path to the CSV file
+ */
+ private void validateHeaderLine(final String path) {
+ final FileSystem fileSystem =
FileSystems.getFileSystem(path).orElseThrow(
Review Comment:
@zkaoudi since streamLines() is static and is the place where file opening
and iterator creation are already defined, I considered it the appropriate
location to perform header validation. However, because it is static, it cannot
access the instance-level separator, and we cannot modify its signature or
behavior to pass the separator or expose the header.
Given these constraints, performing header validation within the same
file-open operation would require changing streamLines(), which @mspruc wanted
to avoid to preserve the existing definition and potential external usages. As
a result, the file is currently opened twice.
Is there something we can do here to avoid the double file open while
keeping the existing structure intact? I’d appreciate your guidance.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]