[ https://issues.apache.org/jira/browse/HADOOP-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12522656 ]
Owen O'Malley commented on HADOOP-1758: --------------------------------------- Note that this code is not only slow, it doesn't work. The string is passed in by value and then modified. *oops* Note that since '%' is always escaped this can be done using sscanf to read the two digits and convert to a byte. > processing escapes in a jute record is quadratic > ------------------------------------------------ > > Key: HADOOP-1758 > URL: https://issues.apache.org/jira/browse/HADOOP-1758 > Project: Hadoop > Issue Type: Bug > Components: record > Affects Versions: 0.13.0 > Reporter: Dick King > Assignee: Vivek Ratan > Priority: Blocker > Fix For: 0.15.0 > > > The following code appears in hadoop/src/c++/librecordio/csvarchive.cc : > static void replaceAll(std::string s, const char *src, char c) > { > std::string::size_type pos = 0; > while (pos != std::string::npos) { > pos = s.find(src); > if (pos != std::string::npos) { > s.replace(pos, strlen(src), 1, c); > } > } > } > This is used in the context of replacing jute escapes in the code: > void hadoop::ICsvArchive::deserialize(std::string& t, const char* tag) > { > t = readUptoTerminator(stream); > if (t[0] != '\'') { > throw new IOException("Errror deserializing string."); > } > t.erase(0, 1); /// erase first character > replaceAll(t, "%0D", 0x0D); > replaceAll(t, "%0A", 0x0A); > replaceAll(t, "%7D", 0x7D); > replaceAll(t, "%00", 0x00); > replaceAll(t, "%2C", 0x2C); > replaceAll(t, "%25", 0x25); > } > Since this replaces the entire string for each instance of the escape > sequence, practically anything would be better. I would propose that within > deserialize we allocate a char * [since each replacement is smaller than the > original], scan for each %, and either do a general hex conversion in place > or look for one of the six patterns, and after each replacement move down the > unmodified text and scan for the % fom that starting point. > -dk -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.