[jira] Created: (HADOOP-1758) processing escapes in a jute record is quadratic

Dick King (JIRA) Wed, 22 Aug 2007 11:04:53 -0700

processing escapes in a jute record is quadratic
------------------------------------------------


                 Key: HADOOP-1758
                 URL: https://issues.apache.org/jira/browse/HADOOP-1758
             Project: Hadoop
          Issue Type: Bug
          Components: record
    Affects Versions: 0.13.0
            Reporter: Dick King
            Priority: Blocker


The following code appears in hadoop/src/c++/librecordio/csvarchive.cc :


static void replaceAll(std::string s, const char *src, char c)
{
  std::string::size_type pos = 0;
  while (pos != std::string::npos) {
    pos = s.find(src);
    if (pos != std::string::npos) {
      s.replace(pos, strlen(src), 1, c);
    }
  }
}

This is used in the context of replacing jute escapes in the code:


void hadoop::ICsvArchive::deserialize(std::string& t, const char* tag)
{
  t = readUptoTerminator(stream);
  if (t[0] != '\'') {
    throw new IOException("Errror deserializing string.");
  }
  t.erase(0, 1); /// erase first character
  replaceAll(t, "%0D", 0x0D);
  replaceAll(t, "%0A", 0x0A);
  replaceAll(t, "%7D", 0x7D);
  replaceAll(t, "%00", 0x00);
  replaceAll(t, "%2C", 0x2C);
  replaceAll(t, "%25", 0x25);

}

Since this replaces the entire string for each instance of the escape sequence, 
practically anything would be better.  I would propose that within deserialize 
we allocate a char * [since each replacement is smaller than the original], 
scan for each %, and either do a general hex conversion in place or look for 
one of the six patterns, and after each replacement move down the unmodified 
text and scan for the % fom that starting point.

-dk


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-1758) processing escapes in a jute record is quadratic

Reply via email to