[jira] [Commented] (TRAFODION-2127) enhance Trafodion implementation of WITH clause

ASF GitHub Bot (JIRA) Wed, 26 Oct 2016 15:57:39 -0700

    [ 
https://issues.apache.org/jira/browse/TRAFODION-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15609960#comment-15609960
 ]


ASF GitHub Bot commented on TRAFODION-2127:
-------------------------------------------

Github user DaveBirdsall commented on a diff in the pull request:

    https://github.com/apache/incubator-trafodion/pull/772#discussion_r85226755
  
    --- Diff: core/sql/executor/ExHdfsScan.cpp ---
    @@ -1700,6 +1711,138 @@ char * 
ExHdfsScanTcb::extractAndTransformAsciiSourceToSqlRow(int &err,
       return NULL;
     }
     
    +void ExHdfsScanTcb::computeRangesAtRuntime()
    +{
    +  int numFiles = 0;
    +  Int64 totalSize = 0;
    +  Int64 myShare = 0;
    +  Int64 runningSum = 0;
    +  Int64 myStartPositionInBytes = 0;
    +  Int64 firstFileStartingOffset = 0;
    +  Int64 lastFileBytesToRead = -1;
    +  Int32 numParallelInstances = MAXOF(getGlobals()->getNumOfInstances(),1);
    +  hdfsFS fs = ((GetCliGlobals()->currContext())->getHdfsServerConnection(
    +                    hdfsScanTdb().hostName_,
    +                    hdfsScanTdb().port_));
    +  hdfsFileInfo *fileInfos = hdfsListDirectory(fs,
    +                                              hdfsScanTdb().hdfsRootDir_,
    +                                              &numFiles);
    +
    +  if (runTimeRanges_)
    +    deallocateRuntimeRanges();
    +
    +  // in a first round, count the total number of bytes
    +  for (int f=0; f<numFiles; f++)
    +    {
    +      ex_assert(fileInfos[f].mKind == kObjectKindFile,
    +                "subdirectories not supported with runtime HDFS ranges");
    +      totalSize += (Int64) fileInfos[f].mSize;
    +    }
    +
    +  // compute my share, in bytes
    +  // (the last of the ESPs may read a bit more)
    +  myShare = totalSize / numParallelInstances;
    +  myStartPositionInBytes = myInstNum_ * myShare;
    +  beginRangeNum_ = -1;
    +  numRanges_ = 0;
    +
    +  if (totalSize > 0)
    +    {
    +      // second round, find out the range of files I need to read
    +      for (int g=0; g<numFiles; g++)
    +        {
    +          Int64 prevSum = runningSum;
    +
    +          runningSum += (Int64) fileInfos[g].mSize;
    +
    +          if (runningSum >= myStartPositionInBytes)
    +            {
    +              if (beginRangeNum_ < 0)
    +                {
    +                  // I have reached the first file that I need to read
    +                  beginRangeNum_ = g;
    +                  firstFileStartingOffset =
    +                    myStartPositionInBytes - prevSum;
    +                }
    +
    +              numRanges_++;
    +
    +              if (runningSum > (myStartPositionInBytes + myShare) &&
    +                  myInstNum_ < numParallelInstances-1)
    +                // the next file is beyond the range that I need to read
    +                lastFileBytesToRead =
    +                  myStartPositionInBytes + myShare - prevSum;
    +                break;
    +            }
    +        }
    +
    +      // now that we now how many ranges we need, allocate them
    +      numRunTimeRanges_ = numRanges_;
    +      runTimeRanges_ = new(getHeap()) HdfsFileInfo[numRunTimeRanges_];
    +    }
    +  else
    +    beginRangeNum_ = 0;
    +
    +  // third round, populate the ranges that this ESP needs to read
    +  for (int h=beginRangeNum_; h<beginRangeNum_+numRanges_; h++)
    --- End diff --
    
    In this design, two or more ESPs may read a section of the same file. Just 
curious: Is there added latency or overhead introduced by doing this? (Old 
thinking, I know; I have visions of disk head dances. Main memory files 
wouldn't have an issue.) Also, we seem to be byte-oriented in our boundaries 
without regard to record boundaries. I take it in some other layer we figure 
out where records begin and end, and read across boundaries or after them as 
needed?


> enhance Trafodion implementation of WITH clause
> -----------------------------------------------
>
>                 Key: TRAFODION-2127
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-2127
>             Project: Apache Trafodion
>          Issue Type: Improvement
>            Reporter: liu ming
>            Assignee: Hans Zeller
>
> TRAFODION-1673 described some details about how to support WITH clause in 
> Trafodion.
> As initial implementation, we use a simple pure-parser method.
> That way, Trafodion can support WITH clause functionally, but not good from 
> performance point of view,
> also need to enhance the parser to be more strict in syntax.
> This JIRA is a follow up JIRA, to track following effort to support Trafodion 
> WITH clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TRAFODION-2127) enhance Trafodion implementation of WITH clause

Reply via email to