Once the graph is built, edges are stored in parallel primitive arrays, so each edge should only take 20 bytes to store (srcId: Long, dstId: Long, attr: Int). Unfortunately, the current implementation in EdgePartitionBuilder uses an array of Edge objects as an intermediate representation for sorting, so each edge will additionally take something like 40 bytes during graph construction (srcId (8) + dstId (8) + attr (4) + uncompressed pointer (8) + object overhead (8) + padding (4)). So you'll need about 1.2 TB of memory in total (60 bytes * 20 billion). Does your cluster have this much memory?
If not, I've been meaning to write a custom sort routine that can operate directly on the three parallel arrays. This will reduce the memory requirements to about 400 GB. Ankur <http://www.ankurdave.com/>