[jira] [Commented] (DRILL-7601) Shift column conversion to reader from scan framework

ASF GitHub Bot (Jira) Tue, 10 Mar 2020 11:13:11 -0700


    [ 
https://issues.apache.org/jira/browse/DRILL-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056210#comment-17056210
 ]


ASF GitHub Bot commented on DRILL-7601:
---------------------------------------

paul-rogers commented on pull request #1993: DRILL-7601: Shift column 
conversion to reader from scan framework
URL: https://github.com/apache/drill/pull/1993#discussion_r390511107
 
 

 ##########
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/easy/text/compliant/TestCsvTableProperties.java
 ##########
 @@ -547,4 +543,65 @@ public void testMessyQuotes() throws Exception {
         .build();
     RowSetUtilities.verify(expected, actual);
   }
+
+  private static final String[] trimData = {
+      " 10 , fred ",
+      " 20, wilma "
+    };
+
+  /**
+   * Trim leading and trailing whitespace. This setting is currently
+   * only available via table properties.
+   */
+  @Test
+  public void testKeepWitespace() throws Exception {
+    try {
+      enableSchemaSupport();
+      String tablePath = buildTable("noTrim", trimData);
+      String sql = String.format("create schema (%s) " +
+          "for table %s PROPERTIES ('" +
+          TextFormatPlugin.HAS_HEADERS_PROP + "'='false', '" +
+          TextFormatPlugin.SKIP_FIRST_LINE_PROP + "'='false')",
+          COL_SCHEMA, tablePath);
+      run(sql);
+      RowSet actual = client.queryBuilder().sql(SELECT_ALL, 
tablePath).rowSet();
+
+      TupleMetadata expectedSchema = new SchemaBuilder()
+          .add("id", MinorType.INT)
+          .add("name", MinorType.VARCHAR)
+          .buildSchema();
+
+      // String-to-number conversion trims strings automatically
+      RowSet expected = new RowSetBuilder(client.allocator(), expectedSchema)
+          .addRow(10, " fred ")
+          .addRow(20, " wilma ")
+          .build();
+      RowSetUtilities.verify(expected, actual);
+    } finally {
+      resetSchemaSupport();
+    }
+  }
+
+ /**
+   * Trim leading and trailing whitespace. This setting is currently
+   * only available via table properties.
+   */
+  @Test
+  public void testTrimWitespace() throws Exception {
+    try {
+      enableSchemaSupport();
+      String tablePath = buildTable("trim", trimData);
+      String sql = String.format("create schema (%s) " +
+          "for table %s PROPERTIES ('" +
+          TextFormatPlugin.HAS_HEADERS_PROP + "'='false', '" +
+          TextFormatPlugin.SKIP_FIRST_LINE_PROP + "'='false', '" +
+          TextFormatPlugin.TRIM_WHITESPACE_PROP + "'='true')",
 
 Review comment:
   Added detail to the "Documentation" section of this PR. There will be a few 
more new properties for JSON. Then I'll figure out how to show that in the 
docs. Perhaps in the provided schema section we need a table of which formats 
support the schema, and which properties each format supports, and what they do.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Shift column conversion to reader from scan framework
> -----------------------------------------------------
>
>                 Key: DRILL-7601
>                 URL: https://issues.apache.org/jira/browse/DRILL-7601
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.17.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>             Fix For: 1.18.0
>
>
> At the time we implemented provided schemas with the text reader, the best 
> path forward appeared to be to perform column type conversions within the 
> scan framework including deep in the column writer structure.
> Experience with other readers has shown that the text reader is a special 
> case: it always writes strings, which Drill-provided converters can parse 
> into other types. Other readers, however are not so simple: they often have 
> their own source structures which must be mated to a column reader, and so 
> conversion is generally best done in the reader where it can be specific to 
> the nuances of each reader.
> This ticket asks to restructure the conversion code to fit the 
> reader-does-conversion pattern.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (DRILL-7601) Shift column conversion to reader from scan framework

Reply via email to