Re: [PR] [CsvIO] Implement CsvParseHelpers::mapFieldPositions method [beam]

via GitHub Fri, 12 Jul 2024 17:14:43 -0700


Abacn commented on code in PR #31870:
URL: https://github.com/apache/beam/pull/31870#discussion_r1676570793



##########
sdks/java/io/csv/src/main/java/org/apache/beam/sdk/io/csv/CsvIOParseHelpers.java:
##########
@@ -73,9 +75,34 @@ static void validate(CSVFormat format, Schema schema) {}
    * Build a {@link List} of {@link Schema.Field}s corresponding to the 
expected position of each
    * field within the CSV record.
    */
-  // TODO(https://github.com/apache/beam/issues/31718): implement method.
-  static List<Schema.Field> mapFieldPositions(CSVFormat format, Schema schema) 
{
-    return new ArrayList<>();
+  static Map<Integer, Schema.Field> mapFieldPositions(CSVFormat format, Schema 
schema) {
+    List<String> header = Arrays.asList(format.getHeader());
+    Map<Integer, Schema.Field> indexToFieldMap = new HashMap<>();
+    for (Schema.Field field : schema.getFields()) {
+      int index = getIndex(header, field);
+      if (index >= 0) {
+        indexToFieldMap.put(index, field);
+      }
+    }
+    return indexToFieldMap;
+  }
+
+  /**
+   * Attains expected index from {@link CSVFormat's} header matching a given 
{@link Schema.Field}.
+   */
+  private static int getIndex(List<String> header, Schema.Field field) {
+    String fieldName = field.getName();
+    boolean presentInHeader = header.contains(fieldName);

Review Comment:
   > This is done during pipeline construction, so performance will not be an 
issue.
   
   I see, sounds good



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [CsvIO] Implement CsvParseHelpers::mapFieldPositions method [beam]

Reply via email to