Hi all, I'm using NumPy to read and process data from ASCII UCD files. This is a file format for describing unstructured finite-element meshes.
Most of the file consists of rectangular, numerical text matrices, easily and efficiently read with loadtxt(). But there is one particularly nasty section that consists of matrices with variable numbers of columns, like this: # index property type nodes 1 1 tet 620 583 1578 1792 2 1 tet 656 551 553 566 3 1 tet 1565 766 1600 1646 4 1 tet 1545 631 1566 1665 5 1 hex 1531 1512 1559 1647 1648 1732 6 1 hex 777 1536 1556 1599 1601 1701 7 1 quad 296 1568 1535 1604 8 1 quad 54 711 285 666 As you might guess, the "type" label in the third column does indicate the number of following columns. Some of my files contain sections like this of *more than 1 million lines*, so I need to be able to read them fast. I have not yet come up with a good way to do this. What I do right now is I split them up into separate arrays based on the "type" label: lines = [f.next() for i in range(n)] lines = [l.split(None, 3) for l in lines] id, prop, types, nodes = apply(zip, lines) # THIS TAKES /FOREVER/ id = array(id, dtype=uint) prop = array(id, dtype=uint) types = array(types) cells = {} for t in N.unique(types): these = N.nonzero(types==t) # THIS NEXT LINE TAKES FOREVER these_nodes = array([nodes[ii].split() for ii in these], dtype=uint).T cells[t] = N.row_stack(( id[these], prop[these], these_nodes )) This is really pretty slow and sub-optimal. Has anyone developed a more efficient way to read arrays with variable numbers of columns??? Dan _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion