[
https://issues.apache.org/jira/browse/DRILL-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955623#comment-16955623
]
ASF GitHub Bot commented on DRILL-4303:
---------------------------------------
paul-rogers commented on pull request #1858: DRILL-4303: ESRI Shapefile (shp)
Format Plugin
URL: https://github.com/apache/drill/pull/1858#discussion_r336791381
##########
File path:
contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpFormatPlugin.java
##########
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.esri;
+
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.common.logical.StoragePluginConfig;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+
+import
org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileReaderFactory;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.proto.UserBitShared.CoreOperatorType;
+import org.apache.drill.exec.server.DrillbitContext;
+import org.apache.drill.exec.server.options.OptionManager;
+import org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin;
+import org.apache.drill.exec.store.dfs.easy.EasySubScan;
+import org.apache.drill.exec.store.esri.ShpBatchReader.ShpReaderConfig;
+import org.apache.hadoop.conf.Configuration;
+
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class ShpFormatPlugin extends EasyFormatPlugin<ShpFormatConfig> {
+
+ private static final Logger logger =
LoggerFactory.getLogger(ShpFormatPlugin.class);
+
+ public static final String PLUGIN_NAME = "shp";
+
+ public static class ShpReaderFactory extends FileReaderFactory {
+ private final ShpReaderConfig readerConfig;
+
+ public ShpReaderFactory(ShpReaderConfig config) {
+ readerConfig = config;
+ }
+
+ @Override
+ public ManagedReader<? extends FileScanFramework.FileSchemaNegotiator>
newReader() {
+ return new ShpBatchReader(readerConfig);
+ }
+ }
+
+ public ShpFormatPlugin(String name, DrillbitContext context, Configuration
fsConf, StoragePluginConfig storageConfig, ShpFormatConfig formatConfig) {
+ super(name, easyConfig(fsConf, formatConfig), context, storageConfig,
formatConfig);
+ }
+
+ @Override
+ public ManagedReader<? extends FileSchemaNegotiator>
newBatchReader(EasySubScan scan, OptionManager options) throws
ExecutionSetupException {
+ return new ShpBatchReader(formatConfig.getReaderConfig(this));
+ }
+
+ @Override
+ protected FileScanFramework.FileScanBuilder frameworkBuilder(OptionManager
options, EasySubScan scan) {
+ FileScanFramework.FileScanBuilder builder = new
FileScanFramework.FileScanBuilder();
+ builder.setReaderFactory(new ShpReaderFactory(new ShpReaderConfig(this)));
+ initScanBuilder(builder, scan);
+ builder.setNullType(Types.optional(TypeProtos.MinorType.VARCHAR));
Review comment:
This one is interesting. You've defined a fixed set of columns. Yet, I can
request others, such as `foo`. The above says that `foo` will be defined as
nullable VARCHAR, filled with nulls. I wonder, when the set of columns are
fixed, should we return an error if the user requests an unknown column?
I think I saw a mailing list post about this recently where, for some format
or other, someone was enforcing the set of available columns.
Not a big deal (it is fail-soft), but worth considering...
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> ESRI Shapefile (shp) format plugin
> ----------------------------------
>
> Key: DRILL-4303
> URL: https://issues.apache.org/jira/browse/DRILL-4303
> Project: Apache Drill
> Issue Type: Improvement
> Components: Storage - Other
> Affects Versions: 1.17.0
> Reporter: Karol Potocki
> Assignee: Charles Givre
> Priority: Major
> Labels: doc-impacting
> Fix For: 1.17.0
>
>
> Allow Drill (drill-gis) to read esri shapefiles, one of the most popular
> geospatial formats.
> Format described here:
> https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf
> It consists of three files (prj - srid information, dbf - data fields, shp -
> geometry data)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)