[
https://issues.apache.org/jira/browse/HDFS-6634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14103048#comment-14103048
]
Colin Patrick McCabe commented on HDFS-6634:
--------------------------------------------
{code}
+ @Idempotent
+ public long getCurrentTxid() throws IOException;
+
+ @Idempotent
+ public EventsList getEditsFromTxid(long txid) throws IOException;
{code}
Some JavaDoc would be nice here. Also, what about renaming "getCurrentTxid" to
"getCurrentEditLogTxId"?
It seems like you're now returning a {{JournalNodeEditLogManifest}} object in
most places that formerly returned a {{RemoteEditLogManifest}}.
{code}
+ if (state == State.CLOSED) {
+ throw new IllegalStateException("Cannot call selectInputStreams() on " +
+ "closed FSEditLog");
+ }
{code}
{{Preconditions.checkState}} might look nicer here
{code}
+ private static FSEditLogOp readOp(EditLogInputStream elis)
+ throws IOException {
+ try {
+ return elis.readOp();
+ // we can get the below two exceptions if a segment is deleted
+ // (because we have accumulated too many edits) or (for the local
journal/
+ // no-QJM case only) if a in-progress segment is finalized under us ...
+ // no need to throw an exception back to the client in this case
+ } catch (FileNotFoundException e) {
+ return null;
+ } catch (TransferFsImage.HttpGetFailedException e) {
+ return null;
+ }
+ }
{code}
Let's add some {{LOG.debug}} messages here in the exception cases.
{{inotify.proto}}: can you add a constant, INVALID_TXID or something to this
file? That way you could use this instead of a raw -1 everywhere. Also,
you're sometimes treating txid as uint64, sometimes as int64... since Java
doesn't have unsigned, and we're never gonna get to 2**63 txids anyway, let's
just make it signed everywhere to avoid confusion.
It seems like HDFS-6777 is rolled into this patch. Are you planning on doing
HDFS-6777 first, or as part of this patch? It might be nice to do it
separately.
{code}
- long accFirstTxId = acc.get(0).getFirstTxId();
+ EditLogInputStream accFirst = acc.get(0);
+ long accFirstTxId = accFirst.getFirstTxId();
if (accFirstTxId == elis.getFirstTxId()) {
- acc.add(elis);
+ // if we have a finalized log segment available at this txid,
+ // we should throw out all in-progress segments at this txid
+ if (elis.isInProgress()) {
+ if (accFirst.isInProgress()) {
+ acc.add(elis);
+ }
+ } else {
+ if (accFirst.isInProgress()) {
+ acc.clear();
+ }
+ acc.add(elis);
+ }
{code}
Why is this necessary? It seems like your caller code could filter out
in-progress edit log segments if that's what you want. Isn't this also
problematic from an HA perspective (i.e. if directory X has a finalized log
segment, but it's unreadable, you *want* to start reading from the unfinalized
one.)
{code}
+ public static Event[] translate(FSEditLogOp op) {
+ if (op.opCode == FSEditLogOpCodes.OP_ADD) {
...
{code}
Should use a switch statement here
{code}
+ public static enum EventType {
+ CREATE, CLOSE, REOPEN, RENAME, METADATA, UNLINK
+ }
{code}
Maybe "REOPEN" would be better as "APPEND". Or at least "OPEN_FOR_APPEND".
It's not my favorite terminology, but it's what we've been using for a while.
Also, we should document somewhere that open-for-read operations don't show up
here.
> inotify in HDFS
> ---------------
>
> Key: HDFS-6634
> URL: https://issues.apache.org/jira/browse/HDFS-6634
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: hdfs-client, namenode, qjm
> Reporter: James Thomas
> Assignee: James Thomas
> Attachments: HDFS-6634.2.patch, HDFS-6634.3.patch, HDFS-6634.4.patch,
> HDFS-6634.5.patch, HDFS-6634.patch, inotify-design.2.pdf,
> inotify-design.3.pdf, inotify-design.pdf, inotify-intro.2.pdf,
> inotify-intro.pdf
>
>
> Design a mechanism for applications like search engines to access the HDFS
> edit stream.
--
This message was sent by Atlassian JIRA
(v6.2#6252)