[ 
https://issues.apache.org/jira/browse/DRILL-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752087#comment-17752087
 ] 

ASF GitHub Bot commented on DRILL-8450:
---------------------------------------

mbeckerle commented on code in PR #2819:
URL: https://github.com/apache/drill/pull/2819#discussion_r1287251884


##########
contrib/format-xml/README.md:
##########
@@ -15,12 +15,15 @@ The default configuration is shown below:
   "extensions": [
     "xml"
   ],
+  "allTextMode": true,
   "dataLevel": 2
 }
 ```
 
 ## Data Types
-All fields are read as strings.  Nested fields are read as maps.  Future 
functionality could include support for lists.
+The XML reader has an `allTextMode` which, when set to `true` reads all data 
fields as strings.
+When set to `false`, Drill will attempt to infer data types.
+Nested fields are read as maps.  Future functionality could include support 
for lists.

Review Comment:
   Not really part of this change set, but I don't know what you are suggesting 
by "future functionality could include support for lists." I'd like to 
understand that plan/idea just as part of grokking all of this XML mapping. 



##########
common/src/main/java/org/apache/drill/common/Typifier.java:
##########
@@ -88,6 +96,40 @@ public class Typifier {
   // If a String contains any of these, try to evaluate it as an equation
   private static final char[] MathCharacters = new char[]{'+', '-', '/', '*', 
'='};
 
+  /**
+   * This function infers the Drill data type of unknown data.
+   * @param data The input text of unknown data type.
+   * @return A {@link MinorType} of the Drill data type.
+   */
+  public static MinorType typifyToDrill (String data) {
+    Entry<Class, String> result = Typifier.typify(data);
+    String dataType = result.getKey().getSimpleName();
+
+    // If the string is empty, return UNKNOWN

Review Comment:
   The next line of code contradicts this comment by returning VARCHAR. 
   (Unless VARCHAR == UNKNOWN, which is news to me.)





> Add Data Type Inference to XML Format Plugin
> --------------------------------------------
>
>                 Key: DRILL-8450
>                 URL: https://issues.apache.org/jira/browse/DRILL-8450
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Format - XML
>    Affects Versions: 1.21.1
>            Reporter: Charles Givre
>            Assignee: Charles Givre
>            Priority: Major
>             Fix For: 1.22.0
>
>
> This PR adds data type inference to the XML format plugin.  In similar 
> fashion to other plugins, it adds a new configuration parameter: allTextMode, 
> which when set to true, reads all data as strings.  The default is true.
> Note that the inference is limited to doubles, date, timestamps, boolean and 
> strings.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to