[
https://issues.apache.org/jira/browse/HADOOP-15461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503867#comment-16503867
]
Steve Loughran commented on HADOOP-15461:
-----------------------------------------
OK; I'll keep an eye on it.
FWIW, the main users of my github winutils builds are people trying to run
spark in localmode on their local machines. If we could actually get localFS to
be happy without needing winutils for its day-to-day work, that would have
people happiest of all, because they don't need to worry about path setup.
spark RDD reads and writes don't need the full attempt to mimic posix with
permissions, symlinks &c....
> Improvements over the Hadoop support with Windows
> -------------------------------------------------
>
> Key: HADOOP-15461
> URL: https://issues.apache.org/jira/browse/HADOOP-15461
> Project: Hadoop Common
> Issue Type: New Feature
> Reporter: Giovanni Matteo Fumarola
> Assignee: Giovanni Matteo Fumarola
> Priority: Major
> Attachments: WinUtils-Functions.pdf, WinUtils.CSV
>
>
> This Jira tracks the effort to improve the interaction between Hadoop and
> Windows Server.
> * Move away from an external process (winutils.exe) for native code:
> ** Replace by native Java APIs (e.g., symlinks);
> ** Replace by something like JNI or so;
> * Fix the build system to fully leverage cmake instead of msbuild;
> * Possible other improvements;
> * Memory and handle leaks.
>
> I did a quick investigation of the performance of WinUtils in YARN. In
> average NM calls 4.76 times per second and 65.51 per container.
>
> | |Requests|Requests/sec|Requests/min|Requests/container|
> |*Sum [WinUtils]*|*135354*|*4.761*|*286.160*|*65.51*|
> |[WinUtils] Execute -help|4148|0.145|8.769|2.007|
> |[WinUtils] Execute -ls|2842|0.0999|6.008|1.37|
> |[WinUtils] Execute -systeminfo|9153|0.321|19.35|4.43|
> |[WinUtils] Execute -symlink|115096|4.048|243.33|57.37|
> |[WinUtils] Execute -task isAlive|4115|0.144|8.699|2.05|
> Interval: 7 hours, 53 minutes and 48 seconds
> Each execution of WinUtils does around *140 IO ops*, of which 130 are DDL ops.
> This means *666.58* IO ops/second due to WinUtils.
> We should start considering to remove WinUtils from Hadoop and creating a JNI
> interface.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]