[ 
https://issues.apache.org/jira/browse/HADOOP-11029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14118439#comment-14118439
 ] 

Colin Patrick McCabe commented on HADOOP-11029:
-----------------------------------------------

The original title here was "LocalFS Statistics performs thread local call per 
byte written."  This title is misleading in a few ways.  First of all, this is 
a generic Filesystem.java thing, not a LocalFS thing.  Secondly, it only 
"performs a thread local call per byte written" if you write a byte at a time.  
If you write a byte at a time, you will have other high overheads, such as 
system call overhead (for local FS).  Finally, I don't think ThreadLocal is the 
source of that locked instruction you pointed out.

That lock prefix is almost certainly coming from the fact that the fields 
inside Statistics are volatile, not from {{ThreadLocal#get}}.  The entire point 
of {{ThreadLocal}} is that it is local to a thread, so no cross-thread 
synchronization should be required to access it (in a good implementation).  
But we do require the volatile variables so that the updated statistics can be 
visible to other threads calling {{Statistics#get}}

The thread-local statistics code was added to alleviate a performance problem 
where we were taking a lock, adding to a counter, and then releasing the lock 
after every write or read operation.  It was a substantial improvement on that 
naive implementation.  See HDFS-5276 for details.

I'm not sure if Java offers a way to avoid those volatile variables.  As I 
mentioned, the reason the Statistics fields are volatile is so that other 
threads can read their values when we're calling {{Statistics#get}}.  Is there 
a way to do the memory barrier only on read?  If so, I'm not aware of it.  
Maybe we could cast to volatile?  Or is there a memory barrier instruction in 
Unsafe?  That would be the only way to avoid the volatile.

Can you put some numbers behind this?  If this is only an issue when you are 
writing a byte at a time, I'd be inclined to close this, since, as I mentioned, 
byte-at-a-time is a well-known anti-pattern with Java streams.

> FileSystem#Statistics uses volatile variables that must be updated on write 
> or read calls.
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-11029
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11029
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.5.0, 2.6.0
>            Reporter: Gopal V
>         Attachments: local-fs-locking.png
>
>
> This code is there in the hot-path of IFile writer via RawLocalFileSystem.
> !local-fs-locking.png!
> From a preliminary glance, the lock prefix calls are coming from a 
> threadlocal.get() within FileSystem.Statistics 
> {code}
>    /**
>      * Get or create the thread-local data associated with the current thread.
>      */
>     private StatisticsData getThreadData() {
>       StatisticsData data = threadData.get();
>       if (data == null) {
>         data = new StatisticsData(
>             new WeakReference<Thread>(Thread.currentThread()));
>         threadData.set(data);
>         synchronized(this) {
>           if (allData == null) {
>             allData = new LinkedList<StatisticsData>();
>           }
>           allData.add(data);
>         }
>       }
>       return data;
>     }
>     /**
>      * Increment the bytes read in the statistics
>      * @param newBytes the additional bytes read
>      */
>     public void incrementBytesRead(long newBytes) {
>       getThreadData().bytesRead += newBytes;
>     }
> {code}
> This is incredibly inefficient when used from FSDataOutputStream
> {code}
>     public void write(int b) throws IOException {
>       out.write(b);
>       position++;
>       if (statistics != null) {
>         statistics.incrementBytesWritten(1);
>       }
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to