Marko, this is great! I always like it when you send out posts like this! Best DQ
Sent from my iPhone > On Apr 5, 2016, at 4:43 PM, Marko Rodriguez <okramma...@gmail.com> wrote: > > Hello, > > With the imminent release of TinkerPop 3.2.0, during our week long code > freeze, I took 3.2.0 for a spin on a 4 node Blade cluster using the > Friendster graph which is composed of 125 million vertices and 2.5 billion > edges. TinkerPop 3.2.0 will release using Spark 1.6.1. Note that there were > some issues in the initial testing around SPARK_WORKER_INSTANCES and > SPARK_WORKER_CORES. The 1.5.2 settings I used previously were "too much" for > 1.6.1. I toned it down a bit and things work smoothly, and interestingly > enough, with seemingly less "firepower," we are getting better results > (speed-wise). Enjoy the results. > > g.V().count() -- answer 125000000 (125 million vertices) > - TinkerPop 3.0.0.MX: 2.5 hours > - TinkerPop 3.0.0: 1.5 hours > - TinkerPop 3.1.1: 23 minutes > - TinkerPop 3.2.0: 6.8 minutes (Spark 1.5.2) > - TinkerPop 3.2.0: 5.5 minutes (Spark 1.6.1) > > g.V().out().count() -- answer 2586147869 (2.5 billion length-1 paths (i.e. > edges)) > - TinkerPop 3.0.0.MX: unknown > - TinkerPop 3.0.0: 2.5 hours > - TinkerPop 3.1.1: 1.1 hours > - TinkerPop 3.2.0: 13 minutes (Spark 1.5.2) > - TinkerPop 3.2.0: 12 minutes (Spark 1.6.1) > > g.V().out().out().count() -- answer 640528666156 (640 billion length-2 paths) > - TinkerPop 3.0.0.MX: unknown > - TinkerPop 3.0.0: unknown > - TinkerPop 3.1.1: unknown > - TinkerPop 3.2.0: 55 minutes (Spark 1.5.2) > - TinkerPop 3.2.0: 50 minutes (Spark 1.6.1) > > g.V().out().out().out().count() -- answer 215664338057221 (215 trillion > length 3-paths) > - TinkerPop 3.0.0.MX: 12.8 hours > - TinkerPop 3.0.0: 8.6 hours > - TinkerPop 3.1.1: 2.4 hours > - TinkerPop 3.2.0: 1.6 hours (Spark 1.5.2) > - TinkerPop 3.2.0: 1.5 hours (Spark 1.6.1) > > g.V().out().out().out().out().count() -- answer 83841426570464575 (83 > quadrillion length 4-paths) > - TinkerPop 3.0.0.MX: unknown > - TinkerPop 3.0.0: unknown > - TinkerPop 3.1.1: unknown > - TinkerPop 3.2.0: unknown (Spark 1.5.2) > - TinkerPop 3.2.0: 2.1 hours (Spark 1.6.1) > > g.V().out().out().out().out().count() -- answer -2280190503167902456 !! I > blew the long space -- 64-bit overflow. > - TinkerPop 3.0.0.MX: unknown > - TinkerPop 3.0.0: unknown > - TinkerPop 3.1.1: unknown > - TinkerPop 3.2.0: unknown (Spark 1.5.2) > - TinkerPop 3.2.0: 2.8 hours (Spark 1.6.1) > > Next, group()-step has been redesigned to be much more efficient in OLAP mode > when the by()-value traversal maintains a ReducingBarrierStep (e.g. count, > sum, max, min, fold, mean, ...). Thus, prior to this moment, something like: > > g.V().group().by(outE().count()).by(count()) > > // this is equivalent to g.V().map(outE().count()).groupCount(), > // but I wanted to test group()'s new reducer model. > > ….would have failed miserably on such a large graph. However, with TinkerPop > 3.2.0, because the second by() (the value traversal) maintains a > ReducingBarrierStep (count()), we get on-the-fly reductions which limits > memory usage and ensure that such group'ing traversal now work at scale in > OLAP. > > g.V().group().by(outE().count()).by(count()) -- answer below. > - TinkerPop 3.2.0: 12 minutes (Spark 1.6.1) > > ==>[0:68889802, 1:14490104, 2:5924264, 3:3630690, 4:2520455, 5:1887641, > 6:1499489, 7:1235456, 8:1048559, 9:909576, 10:802183, 11:716357, 12:644813, > 13:590507, 14:542157, 15:501000, 16:465449, 17:434955, 18:407146, 19:383250, > 20:362687, 21:341529, 22:325269, 23:308506, 24:295382, 25:282257, 26:270540, > 27:259267, 28:248882, 29:241110, 30:240857, 31:221426, 32:213362, 33:206135, > 34:200053, 35:193185, 36:186947, 37:181301, 38:176271, 39:171148, 40:166312, > 41:161646, 42:156552, 43:153162, 44:148875, 45:145339, 46:141780, 47:138058, > 48:135479, 49:131795, 50:128793, 51:126391, 52:123254, 53:121081, 54:118758, > 55:115864, 56:113936, 57:110845, 58:108192, 59:106723, 60:104243, 61:102829, > 62:100759, 63:98617, 64:96827, 65:95385, 66:93629, 67:92324, 68:90519, > 69:88766, 70:87682, 71:85794, 72:84279, 73:83389, 74:81654, 75:80978, > 76:78906, 77:78126, 78:76857, 79:75987, 80:75312, 81:73354, 82:72901, > 83:71195, 84:70463, 85:69502, 86:68107, 87:66984, 88:65986, 89:65349, > 90:64568, 91:63761, 92:63283, 93:62092, 94:61089, 95:60195, 96:59655, > 97:58788, 98:57847, 99:56935, 100:57341, 101:55483, 102:54973, 103:54610, > 104:53367, 105:53699, 106:52948, 107:52060, 108:51386, 109:51032, 110:50442, > 111:49429, 112:48994, 113:48790, 114:48250, 115:47808, 116:47517, 117:47024, > 118:46299, 119:45855, 120:45529, 121:45262, 122:44453, 123:43738, 124:43768, > 125:43257, 126:42852, 127:41977, 128:41580, 129:41091, 130:41027, 131:40569, > 132:40019, 133:39416, 134:39448, 135:38935, 136:38228, 137:37863, 138:37641, > 139:37261, 140:36908, 141:36326, 142:36090, 143:35654, 144:35610, 145:34760, > 146:34946, 147:34355, 148:33948, 149:33946, 150:33341, 151:33193, 152:32877, > 153:32440, 154:32268, 155:31728, 156:31627, 157:30762, 158:30625, 159:30233, > 160:30345, 161:29881, 162:29851, 163:29523, 164:29081, 165:28844, 166:28402, > 167:28053, 168:27706, 169:27623, 170:27502, 171:27156, 172:27112, 173:26538, > 174:26578, 175:26187, 176:25951, 177:25572, 178:25297, 179:25441, 180:24653, > 181:24935, 182:24478, 183:24262, 184:23926, 185:24006, 186:23499, 187:23317, > 188:22860, 189:22704, 190:22441, 191:22565, 192:22164, 193:22105, 194:21728, > 195:21870, 196:21431, 197:21395, > ... > > Take care, > Marko. > > http://markorodriguez.com > -- > You received this message because you are subscribed to the Google Groups > "Gremlin-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to gremlin-users+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/gremlin-users/0F921BDF-E8C6-4A90-B479-68090E8AAEC5%40gmail.com. > For more options, visit https://groups.google.com/d/optout.