[jira] [Commented] (DRILL-7554) Convert LTSV Format Plugin to EVF

ASF GitHub Bot (Jira) Thu, 30 Apr 2020 13:10:07 -0700


    [ 
https://issues.apache.org/jira/browse/DRILL-7554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17096949#comment-17096949
 ]


ASF GitHub Bot commented on DRILL-7554:
---------------------------------------

cgivre commented on a change in pull request #1962:
URL: https://github.com/apache/drill/pull/1962#discussion_r418259652



##########
File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/EasyEVFBatchReader.java
##########
@@ -0,0 +1,329 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.easy;
+
+import org.apache.drill.common.AutoCloseables;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import 
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.hadoop.mapred.FileSplit;
+import org.joda.time.Instant;
+import org.joda.time.LocalDate;
+import org.joda.time.Period;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.math.BigDecimal;
+import java.nio.charset.StandardCharsets;
+
+
+/**
+ * To create a format plugin, there is often a great deal of cut/pasted code. 
The EasyEVFBatchReader
+ * is intended to allow developers of new format plugins to focus their energy 
on code that is truly unique
+ * for each format.
+ * <p>
+ * To create a new format plugin, simply extend this class, and overwrite the 
open() method as shown below in the
+ * snippet below. The code that is unique for the formats will be contained in 
the iterator class which
+ * <p>
+ * With respect to schema creation, there are three basic situations:
+ * <ol>
+ *   <li>Schema is known before opening the file</li>
+ *   <li>Schema is known (and unchanging) after reading the first row of 
data</li>
+ *   <li>Schmea is completely flexible, IE: not consistent and not known after 
first row.</li>
+ * </ol>
+ *
+ * This class inmplements a series of methods to facilitate schema creation. 
However, to achieve the best
+ * possible performance, it is vital to use the correct methods to map the 
schema. Drill will perform fastest when
+ * the schema is known in advance and the writers can be stored in a data 
structure which minimizes the amount of string
+ * comparisons.
+ *
+ * <b>Current state</b>
+ * For the third scenario, where the schema is not consistent, there are a 
collection of functions, all named <pre>writeDataType()</pre>
+ * which accept a ScalarWriter, column name and value. These functions first 
check to see whether the column has been added to the schema, if not
+ * it adds it. If it has been added, the value will be added to the column.
+ *
+ * <p>
+ *
+ *   <pre>
+ *   public boolean open(FileSchemaNegotiator negotiator) {
+ *     super.open(negotiator);
+ *     super.fileIterator = new LTSVRecordIterator(getRowWriter(), reader);
+ *     return true;
+ *   }
+ *   </pre>
+ *
+ */
+public abstract class EasyEVFBatchReader implements 
ManagedReader<FileSchemaNegotiator> {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(EasyEVFBatchReader.class);
+
+  public FileSplit split;
+
+  public EasyEVFIterator fileIterator;

Review comment:
       Correct.  I imagined these "Easy" classes to be used for relatively easy 
formats or for formats where there is some pre-existing java parser.  Thus the 
design pattern would basically be:
   1.  Instantiate the parser, 
   2.  Wrap in the easy iterator 
   3.  Map fields to cols. 
   etc....
   
   This could not be used for JSON, HDF5 or anything more complicated like that.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


> Convert LTSV Format Plugin to EVF
> ---------------------------------
>
>                 Key: DRILL-7554
>                 URL: https://issues.apache.org/jira/browse/DRILL-7554
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - Text &amp; CSV
>    Affects Versions: 1.17.0
>            Reporter: Charles Givre
>            Assignee: Charles Givre
>            Priority: Major
>             Fix For: 1.18.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (DRILL-7554) Convert LTSV Format Plugin to EVF

Reply via email to