[jira] [Commented] (PIG-4003) Error is thrown by JobStats.getOutputSize() when storing to a Hive table

Rohini Palaniswamy (JIRA) Tue, 24 Jun 2014 13:22:41 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042623#comment-14042623
 ]


Rohini Palaniswamy commented on PIG-4003:
-----------------------------------------

Patch looks good. Can you add the property to pig-default.properties and have 
org.apache.hcatalog.pig.HCatStorer,org.apache.hive.hcatalog.pig.HCatStorer  as 
default values as they are well know ones.

> Error is thrown by JobStats.getOutputSize() when storing to a Hive table 
> -------------------------------------------------------------------------
>
>                 Key: PIG-4003
>                 URL: https://issues.apache.org/jira/browse/PIG-4003
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>             Fix For: 0.14.0
>
>         Attachments: PIG-4003-1.patch, PIG-4003-2.patch, PIG-4003-3.patch
>
>
> Here is an example of stack trace printed to console output. Technically, 
> this is a warning message and does not make the job fail. However, this is 
> certainly not user-friendly.
> {code}
> 4/06/09 16:20:28 WARN pigstats.JobStats: unable to find the output file
> java.io.FileNotFoundException: File 
> hdfs://10.61.10.185:9000/user/cheolsoop/prodhive.benchmark.unittest_vhs_bitrate_asn_sum_stg_test2
>  does not exist.
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:654)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:712)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:708)
>       at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:708)
>       at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.FileBasedOutputSizeReader.getOutputSize(FileBasedOutputSizeReader.java:65)
>       at 
> org.apache.pig.tools.pigstats.JobStats.getOutputSize(JobStats.java:352)
> {code}
> The issue is that FileBasedOutputSizeReader mis-interprets hive table name as 
> hdfs path.
> {code}
> @Override
> public boolean supports(POStore sto, Configuration conf) {
>     return UriUtil.isHDFSFileOrLocalOrS3N(getLocationUri(sto), conf);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PIG-4003) Error is thrown by JobStats.getOutputSize() when storing to a Hive table

Reply via email to