[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp

Mostafa Mokhtar (JIRA) Mon, 29 Sep 2014 12:12:15 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mostafa Mokhtar updated HIVE-8292:
----------------------------------
    Description: 
Reading from bucketed partitioned tables has significantly higher overhead 
compared to non-bucketed non-partitioned files.


50% of the profile is spent in MapOperator.cleanUpInputFileChangedOp

5% the CPU in 
{code}
 Path onepath = normalizePath(onefile);
{code}

And 
45% the CPU in 
{code}
 onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
{code}

>From the profiler 
{code}
Stack Trace     Sample Count    Percentage(%)
hive.ql.exec.tez.MapRecordSource.processRow(Object)     5,327   62.348
   hive.ql.exec.vector.VectorMapOperator.process(Writable)      5,326   62.336
      hive.ql.exec.Operator.cleanUpInputFileChanged()   4,851   56.777
         hive.ql.exec.MapOperator.cleanUpInputFileChangedOp()   4,849   56.753
                                 java.net.URI.relativize(URI)   3,903   45.681
                                    java.net.URI.relativize(URI, URI)   3,903   
45.681
                                       java.net.URI.normalize(String)   2,169   
25.386
                                       java.net.URI.equal(String, String)       
526     6.156
                                       java.net.URI.equalIgnoringCase(String, 
String)   1       0.012
                                       java.lang.String.substring(int)  1       
0.012
            hive.ql.exec.MapOperator.normalizePath(String)      506     5.922
            org.apache.commons.logging.impl.Log4JLogger.info(Object)    32      
0.375
                                 java.net.URI.equals(Object)    12      0.14
                                 java.util.HashMap$KeySet.iterator()    5       
0.059
                                 java.util.HashMap.get(Object)  4       0.047
                                 java.util.LinkedHashMap.get(Object)    3       
0.035
         hive.ql.exec.Operator.cleanUpInputFileChanged()        1       0.012
      hive.ql.exec.Operator.forward(Object, ObjectInspector)    473     5.536
      hive.ql.exec.mr.ExecMapperContext.inputFileChanged()      1       0.012
{code}


  was:
Reading from bucketed partitioned tables has significantly higher overhead 
compared to non-bucketed non-partitioned files.


50% of the profile is spent in MapOperator.cleanUpInputFileChangedOp

5% the CPU in 
{code}
 Path onepath = normalizePath(onefile);
{code}

And 
45% the CPU in 
{code}
 onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
{code}

>From the profiler 
{code}
Stack Trace     Sample Count    Percentage(%)
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object)   978     
28.613
   org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable)    
978     28.613
      org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866     
25.336
         org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 
866     25.336
            java.net.URI.relativize(URI)        655     19.163
               java.net.URI.relativize(URI, URI)        655     19.163
                  java.net.URI.normalize(String)        517     15.126
                                        java.net.URI.needsNormalization(String) 
372     10.884
                                           java.lang.String.charAt(int) 235     
6.875
                                                                          
java.net.URI.equal(String, String)    27      0.79
                                                                          
java.lang.StringBuilder.toString()    1       0.029
                                                                          
java.lang.StringBuilder.<init>()      1       0.029
                                                                          
java.lang.StringBuilder.append(String)        1       0.029
                                                                
org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String)        167     
4.886
                                                                   
org.apache.hadoop.fs.Path.<init>(String)     162     4.74
                                                                          
org.apache.hadoop.fs.Path.initialize(String, String, String, String)  162     
4.74
        org.apache.hadoop.fs.Path.normalizePath(String, String) 97      2.838
           org.apache.commons.lang.StringUtils.replace(String, String, String)  
97      2.838
                  org.apache.commons.lang.StringUtils.replace(String, String, 
String, int)      97      2.838
                         java.lang.String.indexOf(String, int)  97      2.838
                java.net.URI.<init>(String, String, String, String, String)     
65      1.902
{code}



> Reading from partitioned bucketed tables has high overhead in 
> MapOperator.cleanUpInputFileChangedOp
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-8292
>                 URL: https://issues.apache.org/jira/browse/HIVE-8292
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.14.0
>         Environment: cn105
>            Reporter: Mostafa Mokhtar
>            Assignee: Prasanth J
>             Fix For: 0.14.0
>
>         Attachments: 2014_09_29_14_46_04.jfr
>
>
> Reading from bucketed partitioned tables has significantly higher overhead 
> compared to non-bucketed non-partitioned files.
> 50% of the profile is spent in MapOperator.cleanUpInputFileChangedOp
> 5% the CPU in 
> {code}
>  Path onepath = normalizePath(onefile);
> {code}
> And 
> 45% the CPU in 
> {code}
>  onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
> {code}
> From the profiler 
> {code}
> Stack Trace   Sample Count    Percentage(%)
> hive.ql.exec.tez.MapRecordSource.processRow(Object)   5,327   62.348
>    hive.ql.exec.vector.VectorMapOperator.process(Writable)    5,326   62.336
>       hive.ql.exec.Operator.cleanUpInputFileChanged() 4,851   56.777
>          hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 4,849   56.753
>                                  java.net.URI.relativize(URI) 3,903   45.681
>                                     java.net.URI.relativize(URI, URI) 3,903   
> 45.681
>                                        java.net.URI.normalize(String) 2,169   
> 25.386
>                                        java.net.URI.equal(String, String)     
> 526     6.156
>                                        java.net.URI.equalIgnoringCase(String, 
> String) 1       0.012
>                                        java.lang.String.substring(int)        
> 1       0.012
>             hive.ql.exec.MapOperator.normalizePath(String)    506     5.922
>             org.apache.commons.logging.impl.Log4JLogger.info(Object)  32      
> 0.375
>                                  java.net.URI.equals(Object)  12      0.14
>                                  java.util.HashMap$KeySet.iterator()  5       
> 0.059
>                                  java.util.HashMap.get(Object)        4       
> 0.047
>                                  java.util.LinkedHashMap.get(Object)  3       
> 0.035
>          hive.ql.exec.Operator.cleanUpInputFileChanged()      1       0.012
>       hive.ql.exec.Operator.forward(Object, ObjectInspector)  473     5.536
>       hive.ql.exec.mr.ExecMapperContext.inputFileChanged()    1       0.012
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp

Reply via email to