Re: How to Run High Capacity Tor Relays
Also, afaik, zero people in the wild are actively running Tor with any crypto accelerator. May be a very painful process... I'm not really interested in documenting it unless its proven to scale by actual use. I want this document to end up with tested and reproduced results only. You know, Science. Not computerscience ;) There was a _very_ interesting, long and detailed discussion of this about 1 year ago on this list. I really do think some subset of that discussion should be included in your lore, at the very least the parts pertaining to the built-in crypto acceleration included in recent sparc CPUs, which appear to be the only non-painful way to make this work. My impression was that a significant boost could be had by accelerating openssl using this on-chip features... *** To unsubscribe, send an e-mail to majord...@torproject.org with unsubscribe or-talkin the body. http://archives.seul.org/or/talk/
Re: How to Run High Capacity Tor Relays
On Wed, Sep 1, 2010 at 2:28 PM, John Case c...@sdf.lonestar.org wrote: ... I really do think some subset of that discussion should be included in your lore, at the very least the parts pertaining to the built-in crypto acceleration included in recent sparc CPUs, which appear to be the only non-painful way to make this work. if you're running a high capacity relay you likely don't need hw acceleration because: a. you're on a fast server with relatively modern processor to get into the high capacity game. assembly optimized crypto is pretty fast on these systems. b. the compression, buffer management, and other aspects of Tor are just as significant as the crypto specific parts on such a server. c. the crypto hw needed to be effective is expensive, at least a grand, or inside specialized server processors you're unlikely to have in your dedicated / leased server hardware. this is not to say it isn't useful. it's useful in all kinds of ways ranging from efficiency improvements, side channel attack resistance, to entropy sources for strong session key / nonce generation. however, i doubt hardware crypto will prove useful for anyone in the top tier of relay capacity to drastically improve their throughput or efficiency overall given the current architecture of Tor itself. and, as mentioned, there have been a number of threads on the subject, and widely expanded OpenSSL engine support added since last year for those interested in experimenting with hw acceleration. best regards, *** To unsubscribe, send an e-mail to majord...@torproject.org with unsubscribe or-talkin the body. http://archives.seul.org/or/talk/
Re: How to Run High Capacity Tor Relays
On 09/01/2010 02:28 PM, John Case wrote: Also, afaik, zero people in the wild are actively running Tor with any crypto accelerator. May be a very painful process... I'm not really interested in documenting it unless its proven to scale by actual use. I want this document to end up with tested and reproduced results only. You know, Science. Not computerscience ;) There was a _very_ interesting, long and detailed discussion of this about 1 year ago on this list. I really do think some subset of that discussion should be included in your lore, at the very least the parts pertaining to the built-in crypto acceleration included in recent sparc CPUs, which appear to be the only non-painful way to make this work. My impression was that a significant boost could be had by accelerating openssl using this on-chip features... If you're using a fast CPU, it's almost not worth the trouble to bother with hardware acceleration. All the best, Jacob *** To unsubscribe, send an e-mail to majord...@torproject.org with unsubscribe or-talkin the body. http://archives.seul.org/or/talk/
Re: How to Run High Capacity Tor Relays
I should have said this in my first post, but I believe that all subsequent replies should go to tor-relays. This should be the last post discussing technical details of relay operation on or-talk. Thus spake coderman (coder...@gmail.com): net.ipv4.tcp_keepalive_time = 1200 ^- who uses keepalive? :) Hrmm, Tor does its own application-level keepalive. Perhaps that's how this got merged in by confusion. Or maybe, like many of these, it was just a blanket cut+and+paste move out of desperation to try to increase capacity. The whole superset of voodoo thing. net.netfilter.nf_conntrack_tcp_timeout_established=7200 net.netfilter.nf_conntrack_checksum=0 net.netfilter.nf_conntrack_max=131072 net.netfilter.nf_conntrack_tcp_timeout_syn_sent=15 ^- best to just disable conntrack altogether if you can. -J NOTRACK in the raw table as appropriate. you're going to each up lots of memory with a decent nf|ip_conntrack_max ( check /proc/sys/net/ipv4/netfilter/ip_conntrack_max , etc ) Will this remove the ability to do PREROUTING DNAT rules? I know a lot of Tor nodes forward ports and even IPs around. Good suggestion though. Perhaps we should mention both options in the final draft. [...] some dupes in here? net.ipv4.ip_forward=1 ... net.ipv4.conf.default.forwarding=1 net.ipv4.conf.default.proxy_arp = 1 ^- BAD! this should not be enabled by default unless you're actually routing specifically to guest vm's or between interfaces or something. if you enable forwarding by default, someone may use you to relay some malicious traffic. Oh shit, that is a relic of Mortiz's config. He is also planning to provide VPN and VPS services. Good catch. Also, does DNAT count as forwarding for the ip_forward option? == Did I leave anything out? == Well, did I? i'd love to see an sca6000 accelerated node. been working with these recently but unfortunately they're allocated for other work... (most of the other crypto hw is going to be bus / implementation limited to less than what a beefy 64bit modern server can provide, so of little utility in this context.) I'd love to hear Roger and Nick's comments on this, but isn't it possible this might also bottleneck well before 1Gbit? I am worried it may depend largely on the architecture of the card and our use of openssl. Their docs claim up to 1Gbit but this could be using highly parallelized processing, which tor cannot really do, as I understand it. Personally I think the hyperthreading option is the lowest hanging fruit for maxing out a single Tor relay process for lowest cost. Also, afaik, zero people in the wild are actively running Tor with any crypto accelerator. May be a very painful process... I'm not really interested in documenting it unless its proven to scale by actual use. I want this document to end up with tested and reproduced results only. You know, Science. Not computerscience ;) -- Mike Perry Mad Computer Scientist fscked.org evil labs pgpUMXxamWLCJ.pgp Description: PGP signature
How to Run High Capacity Tor Relays
After talking to Moritz and Olaf privately and asking them about their nodes, and after running some experiments with some high capacity relays, I've begun to realize that running a fast Tor relay is a pretty black art, with a lot of ad-hoc practice. Only a few people know how to do it, and if you just use Linux and Tor out of the box, your relay will likely underperform on 100Mbit links and above. In the interest of trying to help grow and distribute the network, my ultimate plan is to try to collect all of this lore, use Science to divine out what actually matters, and then write a more succinct blog post about it. However, that is a lot of work. It's also not totally necessary to do all this work, when you can get a pretty good setup with a rough superset of all of the ad-hoc voodoo. This post is thus about that voodoo. Hopefully others will spring forth from the darkness to dump their own voodoo in this thread, as I suspect there is one hell of a lot of it out there, some (much?) of which I don't yet know. Likewise, if any blasphemous heretic wishes to apply Science to this voodoo, they should yell out, Stand Back, I'm Doing Science! (at home please, not on this list) and run some experiments to try to eliminate options that are useless to Tor performance. Or cite academic research papers. (But that's not Science, that's computerscience - which is a religion like voodoo, but with cathedrals). Anyway, on with the draft: == Machine Specs == First, you want to run your OS in x64 mode because openssl should do crypto faster in 64bit. Tor is currently not fully multithreaded, and tends not to benefit beyond 2 cores per process. Even then, the benefit is still marginal beyond just 1 core. 64bit Tor nodes require about one 2Ghz Xeon/Core2 core per 100Mbit of capacity. Thus, to fill an 800Mbit link, you need at least a dual socket, quad core cpu config. You may be able to squeeze a full gigabit out of one of these machines. As far as I know, no one has ever done this with Tor, on any one machine. The i7's also just came out in this form factor, and can do hyperthreading (previous models may list 'ht' in cpuinfo, but actually don't support it). This should give you a decent bonus if you set NumCPUs to 2, since ht tends to work better with pure integer math (like crypto). We have not benchmarked this config yet though, but I suspect it should fill a gigabit link fairly easily, possibly approaching 2Gbit. At full capacity, exit node Tor processes running at this rate consume about 500M of ram. You want to ensure your ram speed is sufficient, but most newish hardware is good. Using on this chart: https://secure.wikimedia.org/wikipedia/en/wiki/List_of_device_bandwidths#Memory_Interconnect.2FRAM_buses you can do the math and see that with a dozen memcpys in each direction, you come out needing DDR2 to be able to push 1Gbit full duplex. As far as ethernet cards, the Intel e1000e *should* be theoretically good, but they seem to fail at properly irq balancing across multiple CPUs on recent kernels, which can cause you to bottleneck at 100% CPU on one core. At least that has been Moritz's experience. In our experiments, the RTL-8169 works fine (once tweaked, see below). == System Tweakscript Wibbles and Config Splatters == First, you want to ensure that you run no more than 2 Tor instances per IP. Any more than this and clients will ignore them. Next, paste the following smattering into the shell (or just read it and make your own script): # Set the hard limit of open file descriptors really high. # Tor will also potentially run out of ports. ulimit -SHn 65000 # Set the txqueuelen high, to prevent premature drops ifconfig eth0 txqueuelen 2 # Tell our ethernet card (interrupt found from /proc/interrupts) # to balance its IRQs across one whole CPU socket (4 cpus, mask 0f). # You only want one socket for optimal ISR and buffer caching. # # Note that e1000e does NOT seem to obey this, but RTL-8169 will. echo 0f /proc/irq/17/smp_affinity # Make sure you have auxiliary nameservers. I've seen many ISP # nameservers fall over under load from fast tor nodes, both on our # nodes and from scans. Or run caching named and closely monitor it. echo nameserver 8.8.8.8 /etc/resolv.conf echo nameserver 4.2.2.2 /etc/resolv.conf # Load an amalgam of gigabit-tuning sysctls from: # http://datatag.web.cern.ch/datatag/howto/tcp.html # http://fasterdata.es.net/TCP-tuning/linux.html # http://www.acc.umu.se/~maswan/linux-netperf.txt # http://www.psc.edu/networking/projects/tcptune/#Linux # and elsewhere... # We have no idea which of these are needed yet for our actual use # case, but they do help (especially the nf-contrack ones): sysctl -p EOF net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 65536 16777216 net.core.netdev_max_backlog = 2500 net.ipv4.tcp_no_metrics_save = 1 net.ipv4.tcp_moderate_rcvbuf = 1 net.core.rmem_max = 1048575 net.core.wmem_max = 1048575 net.ipv4.ip_local_port_range = 1025
Re: How to Run High Capacity Tor Relays
On Tue, Aug 24, 2010 at 8:27 AM, Mike Perry mikepe...@fscked.org wrote: ... # Set the hard limit of open file descriptors really high. # Tor will also potentially run out of ports. ulimit -SHn 65000 typically in /etc/security/limits.conf. i like to append: * softnofile 4096 * hardnofile 65535 but on big servers use .25mm as hard limit. (Tor not this fd hungry, 64k is fine) # Load an amalgam of gigabit-tuning sysctls from: ... # We have no idea which of these are needed yet for our actual use # case, but they do help (especially the nf-contrack ones): you probably want to save in /etc/sysctl.conf , then sysctl -p ... net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 65536 16777216 net.core.netdev_max_backlog = 2500 net.ipv4.tcp_no_metrics_save = 1 net.ipv4.tcp_moderate_rcvbuf = 1 net.core.rmem_max = 1048575 net.core.wmem_max = 1048575 ^- these are important and useful net.ipv4.ip_local_port_range = 1025 61000 ^- that's a little aggressive, better to set FIN timeout lower. i like 5000 to 65535 ephemeral port range net.ipv4.tcp_max_syn_backlog = 10240 net.ipv4.tcp_fin_timeout = 30 ^- i like a fin timeout of 3-4 seconds on a busy server, otherwise you've got lots of resources tied up in sockets waiting to die... Tor not quite so volatile as some services, so perhaps 30 is fine. net.ipv4.tcp_keepalive_time = 1200 ^- who uses keepalive? :) net.netfilter.nf_conntrack_tcp_timeout_established=7200 net.netfilter.nf_conntrack_checksum=0 net.netfilter.nf_conntrack_max=131072 net.netfilter.nf_conntrack_tcp_timeout_syn_sent=15 ^- best to just disable conntrack altogether if you can. -J NOTRACK in the raw table as appropriate. you're going to each up lots of memory with a decent nf|ip_conntrack_max ( check /proc/sys/net/ipv4/netfilter/ip_conntrack_max , etc ) [...] some dupes in here? net.ipv4.ip_forward=1 ... net.ipv4.conf.default.forwarding=1 net.ipv4.conf.default.proxy_arp = 1 ^- BAD! this should not be enabled by default unless you're actually routing specifically to guest vm's or between interfaces or something. if you enable forwarding by default, someone may use you to relay some malicious traffic. were these cut and paste errors? remember to disable forwarding first, before tuning other parameters, as changing this value will reset some others back to defaults. (!!) net.ipv4.tcp_syncookies = 1 ^- not usually worth the overhead? net.ipv4.conf.all.rp_filter = 1 ^- note that you need to be precise with your routing metrics and such for multi-homed with rp_filter enabled. also, this costs resources, and if you can avoid it, do so. net.ipv4.conf.default.send_redirects = 1 net.ipv4.conf.all.send_redirects = 0 ^- don't know if these are too useful either. i prefer to limit ICMP beyond this. (perhaps related to forwarding defaults above.) Ex: echo 1 /proc/sys/net/ipv4/icmp_echo_ignore_broadcasts echo 1 /proc/sys/net/ipv4/icmp_echo_ignore_all echo 0 /proc/sys/net/ipv4/conf/all/accept_redirects echo 1 /proc/sys/net/ipv4/icmp_ignore_bogus_error_responses == Did I leave anything out? == Well, did I? i'd love to see an sca6000 accelerated node. been working with these recently but unfortunately they're allocated for other work... (most of the other crypto hw is going to be bus / implementation limited to less than what a beefy 64bit modern server can provide, so of little utility in this context.) best regards, *** To unsubscribe, send an e-mail to majord...@torproject.org with unsubscribe or-talkin the body. http://archives.seul.org/or/talk/