[ 
https://issues.apache.org/jira/browse/HIVE-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-3198:
------------------------

    Description: 
I'm working on a custom StorageHandler implementation. I use 
configureTableJobProperties to pass properties onto a serde & InputFormat, but 
it looks to me like the properties aren't present inside the InputFormat.

I found the following code which looks like it's supposed to propagate 
JobProperties:
{code}
public class HiveInputFormat<K extends WritableComparable, V extends Writable>
...
  public RecordReader getRecordReader(InputSplit split, JobConf job,
      Reporter reporter) throws IOException {

    HiveInputSplit hsplit = (HiveInputSplit) split;
...
    boolean nonNative = false;
    PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
    if ((part != null) && (part.getTableDesc() != null)) {
      Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), cloneJobConf);
      nonNative = part.getTableDesc().isNonNative();
    }
{code}

In the debugger, I see that part==null so copyTableJobPropertiesToConf doesn't 
get called. I see that for this table:
{code}
create external table test3 () STORED BY 'foo' location '/data/bar';
{code}
The InputSplit path is the *file* (i.e. "/data/bar/part-00000") but 
pathToPartitionInfo has an entry for the *dir* (i.e "/data/bar").

I attached a patch which fixes the problem for me; it makes things explicit by 
passing along the directory name inside the HiveInputSplit; this mean we don't 
have to figure out which files are a part of which partition.


  was:
I'm working on a custom StorageHandler implementation. I use 
configureTableJobProperties to pass properties onto a serde & InputFormat, but 
it looks to me like the properties aren't present inside the InputFormat.

I found the following code which looks like it's supposed to propagate 
JobProperties:
{code}
public class HiveInputFormat<K extends WritableComparable, V extends Writable>
...
  public RecordReader getRecordReader(InputSplit split, JobConf job,
      Reporter reporter) throws IOException {

    HiveInputSplit hsplit = (HiveInputSplit) split;
...
    boolean nonNative = false;
    PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
    if ((part != null) && (part.getTableDesc() != null)) {
      Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), cloneJobConf);
      nonNative = part.getTableDesc().isNonNative();
    }
{code}

In the debugger, I see that part==null so copyTableJobPropertiesToConf doesn't 
get called. I see that for this table:
{code}
create external table test3 () STORED BY 'foo' location '/data/bar';
{code}
The InputSplit path is the *file* (i.e. "/data/bar/part-00000") but 
pathToPartitionInfo has an entry for the *dir* (i.e "/data/bar").

I attached a patch which fixes the problem for me; it makes things explicit by 
passing along the directory name inside the HiveInputSplit; this mean we don't 
have to figure out which files are a part of which partition.

       Assignee: Navis
        Summary: Table properties of non-native table are not transferred to 
RecordReader  (was: StorageHandler properties not passed to InputFormat (?))

For non-native tables hive delegates HiveInputFormat to create input splits and 
record readers. But most of input formats in hadoop replaces directories (which 
is location of table/partition) to concrete file names in it, which causes not 
finding appropriate partition desc by simple map access of pathToPartitionInfo.

It can be simply fixed by searching partition in recursive manner which is 
CombinHiveInputFormat is already doing as commented below. But it seemed to 
hard to make a proper test case for this case, so I'll just upload the code 
patch.
                
> Table properties of non-native table are not transferred to RecordReader
> ------------------------------------------------------------------------
>
>                 Key: HIVE-3198
>                 URL: https://issues.apache.org/jira/browse/HIVE-3198
>             Project: Hive
>          Issue Type: Bug
>         Environment: trunk r1352973
>            Reporter: Brian Bloniarz
>            Assignee: Navis
>         Attachments: TestStorageHandler.java, inputformat.patch
>
>
> I'm working on a custom StorageHandler implementation. I use 
> configureTableJobProperties to pass properties onto a serde & InputFormat, 
> but it looks to me like the properties aren't present inside the InputFormat.
> I found the following code which looks like it's supposed to propagate 
> JobProperties:
> {code}
> public class HiveInputFormat<K extends WritableComparable, V extends Writable>
> ...
>   public RecordReader getRecordReader(InputSplit split, JobConf job,
>       Reporter reporter) throws IOException {
>     HiveInputSplit hsplit = (HiveInputSplit) split;
> ...
>     boolean nonNative = false;
>     PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
>     if ((part != null) && (part.getTableDesc() != null)) {
>       Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), 
> cloneJobConf);
>       nonNative = part.getTableDesc().isNonNative();
>     }
> {code}
> In the debugger, I see that part==null so copyTableJobPropertiesToConf 
> doesn't get called. I see that for this table:
> {code}
> create external table test3 () STORED BY 'foo' location '/data/bar';
> {code}
> The InputSplit path is the *file* (i.e. "/data/bar/part-00000") but 
> pathToPartitionInfo has an entry for the *dir* (i.e "/data/bar").
> I attached a patch which fixes the problem for me; it makes things explicit 
> by passing along the directory name inside the HiveInputSplit; this mean we 
> don't have to figure out which files are a part of which partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to