[jira] [Commented] (ACCUMULO-578) consider using hdfs for the walog

Eric Newton (JIRA) Thu, 24 May 2012 09:32:30 -0700

    [ 
https://issues.apache.org/jira/browse/ACCUMULO-578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13282602#comment-13282602
 ]


Eric Newton commented on ACCUMULO-578:
--------------------------------------

More thinking about file GC: we can eliminate the possibility of removing a log 
file that is in use by asking the tserver to remove the log, instead of doing 
it in the GC.

# open the file and begin using it, in a directory named for the tserver address
# write references to the log into the !METADATA table, as the log is used
# tablet server removes references to logs as tablets flush to disk
# gc asks the tserver to remove the file when it sees no METADATA table 
references
#* the tablet server ignores the request if it is still using the log
# master will assign log sorts when it finds an unassigned tablet with log 
references
#* log sorts need to recover the lease on the file to prevent stray updates 
from appearing
#* log sorts should be monitored, perhaps made into a FATE operation
# once a tablet's logs have been sorted, the tablet is assigned by the master
# gc will remove sorted log entries when all references to the logs have been 
removed
# as always, checks against the !METADATA table have to use the special 
consistency checking iterators

                
> consider using hdfs for the walog
> ---------------------------------
>
>                 Key: ACCUMULO-578
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-578
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: logger, tserver
>    Affects Versions: 1.5.0-SNAPSHOT
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>         Attachments: HDFS_WAL_states.pdf, NNOpsComparison.pdf, comparison.png
>
>
> Using HDFS for walogs would fix:
>  * ACCUMULO-84: any node can read the replicated files
>  * ACCUMULO-558: wouldn't need to monitor loggers
>  * ACCUMULO-544: log references wouldn't include hostnames
>  * ACCUMULO-423: wouldn't need to monitor loggers
>  * ACCUMULO-258: hdfs has load balancing already
> To implement it, we would need the ability to distribute log sorts.
> Continuing to use loggers helps us avoid:
>  * hdfs pipeline strategy
>  * we don't have fine-grained insight when a single node makes dfs slow
>  * additional namenode pressure
>  * flexibility: for example, we can add fadvise() calls to the logger before 
> HDFS supports it

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ACCUMULO-578) consider using hdfs for the walog

Reply via email to