[
https://issues.apache.org/jira/browse/HADOOP-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Konstantin Shvachko updated HADOOP-1283:
----------------------------------------
Attachment: EliminateUTF8.patch
This patch does all the above except for 5. I don't want to change image and
edits log format at this point.
AFAIK UTF8 and BytesWritable serializations differ only in the type of the
length field.
UTF8 uses short, while in BytesWritable it is integer.
For the name-node in-memory structures I use a subclass of BytesWritable called
StringBytesWritable.
It mostly contains conversion methods from/to String.
I removed implementations of the deprecated obtainLock() and releaseLock()
methods in FSNamesystem.
The methods now returns OPERATION_FAILED.
Let me know if we need to keep the implementations. Otherwise we should remove
them and related data-structures
on the name-node like activeLocks.
> Eliminate internal UTF8 to String and vice versa conversions in the name-node.
> ------------------------------------------------------------------------------
>
> Key: HADOOP-1283
> URL: https://issues.apache.org/jira/browse/HADOOP-1283
> Project: Hadoop
> Issue Type: Improvement
> Components: dfs
> Affects Versions: 0.12.0
> Reporter: Konstantin Shvachko
> Attachments: EliminateUTF8.patch
>
>
> We have internal conversions of those two types inside name-node code. One
> example:
> NameNode.complete(String src, String clientName)
> then it calls
> FSNamesystem.completeFile(new UTF8(src), new UTF8(clientName));
> which in turn finally calls
> FSDirectory.addNode(path.toString(), newNode )
> and in another place
> FSDirectory.getNode(src.toString())
> So we have several conversions of the same parameter back and forth during
> computation.
> We should keep the parameter type consistent within different methods.
> The question is, which type should be used: String or Text.
> From previous discussions I remember that Text is more efficient in space and
> time for non ASCII
> data. Here we mostly deal with file names and network addresses, which are
> ASCII.
> Does it make sense to use Text in this case?
> UTF8 is also used as a key in two maps: pendingCreates and leases.
> This should be replaced too.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.