[ https://issues.apache.org/jira/browse/HADOOP-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525046 ]
Hadoop QA commented on HADOOP-1758: ----------------------------------- +1 http://issues.apache.org/jira/secure/attachment/12364812/1758_01.patch applied and successfully tested against trunk revision r572826. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/688/testReport/ Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/688/console > processing escapes in a jute record is quadratic > ------------------------------------------------ > > Key: HADOOP-1758 > URL: https://issues.apache.org/jira/browse/HADOOP-1758 > Project: Hadoop > Issue Type: Bug > Components: record > Affects Versions: 0.13.0 > Reporter: Dick King > Assignee: Vivek Ratan > Priority: Blocker > Fix For: 0.15.0 > > Attachments: 1758_01.patch > > > The following code appears in hadoop/src/c++/librecordio/csvarchive.cc : > static void replaceAll(std::string s, const char *src, char c) > { > std::string::size_type pos = 0; > while (pos != std::string::npos) { > pos = s.find(src); > if (pos != std::string::npos) { > s.replace(pos, strlen(src), 1, c); > } > } > } > This is used in the context of replacing jute escapes in the code: > void hadoop::ICsvArchive::deserialize(std::string& t, const char* tag) > { > t = readUptoTerminator(stream); > if (t[0] != '\'') { > throw new IOException("Errror deserializing string."); > } > t.erase(0, 1); /// erase first character > replaceAll(t, "%0D", 0x0D); > replaceAll(t, "%0A", 0x0A); > replaceAll(t, "%7D", 0x7D); > replaceAll(t, "%00", 0x00); > replaceAll(t, "%2C", 0x2C); > replaceAll(t, "%25", 0x25); > } > Since this replaces the entire string for each instance of the escape > sequence, practically anything would be better. I would propose that within > deserialize we allocate a char * [since each replacement is smaller than the > original], scan for each %, and either do a general hex conversion in place > or look for one of the six patterns, and after each replacement move down the > unmodified text and scan for the % fom that starting point. > -dk -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.