[
https://issues.apache.org/jira/browse/HIVE-11118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14617271#comment-14617271
]
Sushanth Sowmyan commented on HIVE-11118:
-----------------------------------------
I have a question here - I will open another bug if need be, but if it's a
simple misunderstanding, it won't matter.
>From the patch, I see the following bit:
{code}
337 private void ensureFileFormatsMatch(TableSpec ts, URI
fromURI) throws SemanticException {
338 Class<? extends InputFormat> destInputFormat =
ts.tableHandle.getInputFormatClass();
339 // Other file formats should do similar check to make sure file
formats match
340 // when doing LOAD DATA .. INTO TABLE
341 if (OrcInputFormat.class.equals(destInputFormat)) {
342 Path inputFilePath = new Path(fromURI);
343 try {
344 FileSystem fs = FileSystem.get(fromURI, conf);
345 // just creating orc reader is going to do sanity checks to
make sure its valid ORC file
346 OrcFile.createReader(fs, inputFilePath);
347 } catch (FileFormatException e) {
348 throw new
SemanticException(ErrorMsg.INVALID_FILE_FORMAT_IN_LOAD.getMsg("Destination" +
349 " table is stored as ORC but the file being loaded is not a
valid ORC file."));
350 } catch (IOException e) {
351 throw new SemanticException("Unable to load data to destination
table." +
352 " Error: " + e.getMessage());
353 }
354 }
355 }
{code}
Now, it's entirely possible that the table in question is an ORC table, but the
partition being loaded is of another format, such as Text - Hive supports mixed
partition scenarios. In fact, this is a likely scenario in the case of a
replication of a table that used to be Text, but has been converted to Orc, so
that all new partitions will be orc. Then, in that case, the destination table
will be a MANAGED_TABLE, and will be an "orc" table, but import will try to
load a text partition on to it.
Shouldn't this refer to a partitionspec rather than the table's inputformat for
this check to work with that scenario?
> Load data query should validate file formats with destination tables
> --------------------------------------------------------------------
>
> Key: HIVE-11118
> URL: https://issues.apache.org/jira/browse/HIVE-11118
> Project: Hive
> Issue Type: Bug
> Affects Versions: 2.0.0
> Reporter: Prasanth Jayachandran
> Assignee: Prasanth Jayachandran
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11118.2.patch, HIVE-11118.3.patch,
> HIVE-11118.4.patch, HIVE-11118.patch
>
>
> Load data local inpath queries does not do any validation wrt file format. If
> the destination table is ORC and if we try to load files that are not ORC,
> the load will succeed but querying such tables will result in runtime
> exceptions. We can do some simple sanity checks to prevent loading of files
> that does not match the destination table file format.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)