[jira] [Comment Edited] (ACCUMULO-118) accumulo could work across HDFS instances, which would help it to scale past a single namenode

Keith Turner (JIRA) Tue, 28 May 2013 13:22:05 -0700

    [ 
https://issues.apache.org/jira/browse/ACCUMULO-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13668603#comment-13668603
 ]


Keith Turner edited comment on ACCUMULO-118 at 5/28/13 8:21 PM:
----------------------------------------------------------------

I was looking at some docs on viewfs.  If possible, I am thinking we should not 
do anything that would preclude using viewfs.   It seems like if URIs were 
supported for tablet dirs and files (along with a way to choose a tablet dir) 
that this would almost be enough to support viewfs.

{noformat}
  1;m srv:dir viewfs://clusterX/accumulo1/tables/abc
  1;m file:viewfs://clusterX/accumulo1/tables/abc/F0000002.rf []    196,1

  1< srv:dir viewfs://clusterX/accumulo2/tables/abc
  1< file:viewfs://clusterX/accumulo2/tables/abc/F0000003.rf []    196,1
{noformat}

If we want to further develop our own indirection layer, then maybe we should 
define our own URI prefix.   Something like ans://.  How independent should 
this URI be?  Something like ans://<namespace name>/<path> would assume that 
you know where to look <namespace name> up.  If the URI were like 
ans://<zookeepers>+<instance id>+<namespace name>/<path> then it would be more 
self contained.   I do not think its necessary to make it self contained, its 
for internal use and would be translated by as needed.

I was thinking about how bulk import will work in this federated world.  Below 
is one way this could work.

 * Client calls import dir w/ /foo1
 * Accumlo client code uses local config to convert /foo1 to URI hdfs://nn1/foo1
 * hdfs://nn1/foo1 is passed to Accumulo server code via thrift
 * Accumulo server code looks at URI to determine where to move to, determines 
it has accumulo dir hdfs://nn1/accumulo.
 * moves files in hdfs://nn1/foo1 to hdfs://nn1/accumulo/tables/abc
 * Replaces hdfs://nn1/accumulo/tables/abc with ans://ns1/accumulo/tables/abc
 * Does bulk import of files in ans://ns1/accumulo/tables/abc

Is this how this should work?  The scenario above implies that Accumulo needs a 
dir on each namenode and way of mapping URIs to the appropriate Accumulo dir.  
Need to wor through this scenario w/ viewfs also.  


                
      was (Author: kturner):
    I was looking at some docs on viewfs.  If possible, I am thinking we should 
not do anything that would preclude using viewfs.   It seems like if URIs were 
supported for tablet dirs and files (along with a way to choose a tablet dir) 
that this would almost be enough to support viewfs.

{noformat}
  1;m srv:dir viewfs://clusterX/accumulo1/tables/abc
  1;m file:viewfs://ns1/accumulo1/tables/abc/F0000002.rf []    196,1

  1< srv:dir viewfs://clusterX/accumulo2/tables/abc
  1< file:viewfs://ns1/accumulo2/tables/abc/F0000003.rf []    196,1
{noformat}

If we want to further develop our own indirection layer, then maybe we should 
define our own URI prefix.   Something like ans://.  How independent should 
this URI be?  Something like ans://<namespace name>/<path> would assume that 
you know where to look <namespace name> up.  If the URI were like 
ans://<zookeepers>+<instance id>+<namespace name>/<path> then it would be more 
self contained.   I do not think its necessary to make it self contained, its 
for internal use and would be translated by as needed.

I was thinking about how bulk import will work in this federated world.  Below 
is one way this could work.

 * Client calls import dir w/ /foo1
 * Accumlo client code uses local config to convert /foo1 to URI hdfs://nn1/foo1
 * hdfs://nn1/foo1 is passed to Accumulo server code via thrift
 * Accumulo server code looks at URI to determine where to move to, determines 
it has accumulo dir hdfs://nn1/accumulo.
 * moves files in hdfs://nn1/foo1 to hdfs://nn1/accumulo/tables/abc
 * Replaces hdfs://nn1/accumulo/tables/abc with ans://ns1/accumulo/tables/abc
 * Does bulk import of files in ans://ns1/accumulo/tables/abc

Is this how this should work?  The scenario above implies that Accumulo needs a 
dir on each namenode and way of mapping URIs to the appropriate Accumulo dir.  
Need to wor through this scenario w/ viewfs also.  



                  
> accumulo could work across HDFS instances, which would help it to scale past 
> a single namenode
> ----------------------------------------------------------------------------------------------
>
>                 Key: ACCUMULO-118
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-118
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: master, tserver
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>            Priority: Blocker
>             Fix For: 1.6.0
>
>         Attachments: ACCUMULO-118-01.txt
>
>   Original Estimate: 2,016h
>  Remaining Estimate: 2,016h
>
> Consider using full path names to files, which would allow the servers to 
> access the files on any HDFS file system.
> Work may exist elsewhere to run HDFS using a number of NameNode instances to 
> break up the namespace.
> We may need a pluggable strategy to determine namespace for new files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (ACCUMULO-118) accumulo could work across HDFS instances, which would help it to scale past a single namenode

Reply via email to