[ 
https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978122#action_12978122
 ] 

stack commented on HBASE-3417:
------------------------------

As discussed up on IRC, this is not backward compatible:

{code}
+    Pattern.compile("^(\\w{32})(?:\\.(.+))?$");
{code}

You can do a range IIRC 20-32 (was old length 20 chars?)

The below is a little bit messy.:

{code}
+    return new Path(dir, UUID.randomUUID().toString().replaceAll("-", "")
+        + ((suffix == null || suffix.length() <= 0) ? "" : suffix));
{code}

Up on IRC, was thinking should base64 because then it'd be more compact.   See 
http://stackoverflow.com/questions/772802/storing-uuid-as-base64-string.  There 
is also in hbase util a Base64#encodeBytes method that will take the 128 UUID 
bits and emit them as base64 (Possible to get it all down to 22 chars).  But 
looking at the base64 vocabulary, http://en.wikipedia.org/wiki/Base64, it 
includes '+' and '/' which are illegal in URL, a hdfs filepath.  Base32?  
http://en.wikipedia.org/wiki/Base32? But that won't work either.  Has to be 
multiples of 40 bits.

Maybe leave it as it comes out of UUID.toString w/ hyphens.  Then its plain its 
a UUID and its easier to read?



> CacheOnWrite is using the temporary output path for block names, need to use 
> a more consistent block naming scheme
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3417
>                 URL: https://issues.apache.org/jira/browse/HBASE-3417
>             Project: HBase
>          Issue Type: Bug
>          Components: io, regionserver
>    Affects Versions: 0.92.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3417-v1.patch, HBASE-3417-v2.patch
>
>
> Currently the block names used in the block cache are built using the 
> filesystem path.  However, for cache on write, the path is a temporary output 
> file.
> The original COW patch actually made some modifications to block naming stuff 
> to make it more consistent but did not do enough.  Should add a separate 
> method somewhere for generating block names using some more easily mocked 
> scheme (rather than just raw path as we generate a random unique file name 
> twice, once for tmp and then again when moved into place).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to