[ 
https://issues.apache.org/jira/browse/HADOOP-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523200
 ] 

Vivek Ratan commented on HADOOP-1758:
-------------------------------------

I propose sticking to Dick's suggestion: after reading the string, we replace 
all the characters in one shot. That's the way the Java version of RecordIO 
does it too, and it makes sense to maintain that symmetry. 

> processing escapes in a jute record is quadratic
> ------------------------------------------------
>
>                 Key: HADOOP-1758
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1758
>             Project: Hadoop
>          Issue Type: Bug
>          Components: record
>    Affects Versions: 0.13.0
>            Reporter: Dick King
>            Assignee: Vivek Ratan
>            Priority: Blocker
>             Fix For: 0.15.0
>
>
> The following code appears in hadoop/src/c++/librecordio/csvarchive.cc :
> static void replaceAll(std::string s, const char *src, char c)
> {
>   std::string::size_type pos = 0;
>   while (pos != std::string::npos) {
>     pos = s.find(src);
>     if (pos != std::string::npos) {
>       s.replace(pos, strlen(src), 1, c);
>     }
>   }
> }
> This is used in the context of replacing jute escapes in the code:
> void hadoop::ICsvArchive::deserialize(std::string& t, const char* tag)
> {
>   t = readUptoTerminator(stream);
>   if (t[0] != '\'') {
>     throw new IOException("Errror deserializing string.");
>   }
>   t.erase(0, 1); /// erase first character
>   replaceAll(t, "%0D", 0x0D);
>   replaceAll(t, "%0A", 0x0A);
>   replaceAll(t, "%7D", 0x7D);
>   replaceAll(t, "%00", 0x00);
>   replaceAll(t, "%2C", 0x2C);
>   replaceAll(t, "%25", 0x25);
> }
> Since this replaces the entire string for each instance of the escape 
> sequence, practically anything would be better.  I would propose that within 
> deserialize we allocate a char * [since each replacement is smaller than the 
> original], scan for each %, and either do a general hex conversion in place 
> or look for one of the six patterns, and after each replacement move down the 
> unmodified text and scan for the % fom that starting point.
> -dk

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to