Hi,
I've been testing my search index on my 16 node ARM system and have been
running into some strange behavior. The cool part is that the locale
partitioning concept seems to work well, the downside is that the system is
very slow. I've rewritten the approach a few different ways and haven't
made a dent, so wanted to ask a few questions.
On the ARM processors, I can only use FIFO and can't optimize (--fast
doesn't work). Is this going to significantly affect cross-locale
performance?
I've looked at the generated C code and tried to minimize the _comm_
operations in core methods, but doesn't seem to help. Network usage is
still quite low (100K/s) while CPUs are pegged. Are there any profiling
tools I can use to understand what might be going on here?
Generally, on my laptop or single node, I can index about 1.1MM records in
under 10s. With 16 nodes, it takes 10min to do 100k records.
Wondering if there's some systemic issue at play here and how can further
investigate.
Thanks!
Brian
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Chapel-developers mailing list
Chapel-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/chapel-developers