Hi Chris, We are trying measure a performance of how fast filemanager+crawler is performing.
Here is what we are trying to do: * Total data to process : 262GB * 3 file managers and 9 crawlers * where 3 crawlers are sending file location to file manager to process the file * We have our own schema running on postgresql database * Custom H5 Extactor using h5dump utility Questions: 1) I have tried using FileUtils.copyFile vs FileUtils.moveFile, but I don't see any difference in processing time. Both my LandingZone and Archive Area are located on same Filesystem(GPFS). It is roughly taking 100 minutes to process 262G data. Can you shed any light on why don't we see any performance change ? 2) The other thing also is that I don't see any performance gain between running 2 FM or 3FM. I thought that I would see some performance gain due to concurrency. Same goes for multiple crawlers. I was hoping to see pretty obvious performance change if I increase number of crawlers. What are thoughts on running things in parallel to increase performance. 3) Like I said earlier, we are running crawler to push data to file manager. If I run it that way, then "data transfer(copy or move)" is happing on the crawler side. I can not find any way to let file manager handle "data transfer" using on of your runtime options. Please let me know if you guys know how to do that ? We have enough processing power to run multiple FM and Crawlers for scalability. But for some reason crawler is not scaling enough. Regards -- Chintu Mistry NASA Goddard Space Flight Center Bldg L40B, Room S776 Office: 240 684 0477 Mobile: 770 310 1047
