On Thu, Jan 27, 2011 at 4:33 PM, Dewald Pieterse <[email protected]>wrote:
> > > On Thu, Jan 27, 2011 at 4:19 PM, Christopher Barker <[email protected] > > wrote: > >> On 1/27/11 1:03 PM, Dewald Pieterse wrote: >> >>> I am processing two csv files against another, my first implementation >>> used python list of lists and list.append to generate a new list while >>> looping all the data including the non-relevant data (can't determine >>> location of specific data element in a list of list). So I re-implented >>> the exact same code but using numpy.array's (2d arrays) using >>> numpy.where to prevent looping over an entire dataset needlessly but the >>> numpy.array based code is about 7.6 times slower? >>> >> >> Didn't look at your code in any detail, but: >> >> numpy arrays are not designed to be re-sizable, so numpy.append actually >> creates a new array, and copies the old to the new, along with the new >> stuff. It's a convenience function, but it means you are re-allocating and >> copying all your data with each call. >> >> python lists, on the other hand, are designed to be re-sizable, so they >> pre-allocate extra room, so that appending can be fast. >> >> In general, the recommended solution in this sort of situation is to build >> up your data in a python list, then convert it to an array. >> >> If I'm right about what you're doing you could keep the "rows" as numpy >> arrays, but put them in a list while building it up. >> > > Thanks Chris, I believe this is the problem then, I can continue to use the > arrays as reference data but build list instead, the only reason I used the > arrays was to be able to use numpy.where, I just use both data types, best > of both worlds. As I already have row arrays I will do a build a list or > arrays. > Now my code is nearly 4 times faster than the list of lists implementation! Wonderful, thanks. > >> Also, a numpy array of strings isn't necessarily a great dats structure >> for this kind of data. YOu might want to look at structured arrays. >> > > Atm, I use : > comit_eqp_reader = csv.reader(comit_eqp_file, delimiter=',', quotechar='"') > comit_eqp_lt = numpy.array([[col for col in row] for row in > comit_eqp_reader]) > to setup the arrays, I will look at using structured arrays > >> >> I wrote an appendable numpy array class a while back, to address this. It >> has some advantages, though, as it it written, not as much as you'd think. >> It does have some benifits for structured arrays, though. >> >> >> Code enclosed >> >> -Chris >> >> >> >> relevant list of list code: >>> >>> starttime = time.clock() >>> #NI_data_list room_eqp_list >>> NI_data_list_new = [] >>> for NI_row in NI_data_list: >>> treelevel = NI_row[0] >>> elevation = NI_row[1] >>> locater = NI_row[2] >>> area = NI_row[3] >>> NIroom = NI_row[4] >>> #Write appropriate equipment models and drawing into new list >>> if NIroom != '': >>> #Write appropriate equipment models and drawing into new list >>> for row in room_eqp_list: >>> eqp_room = row[0] >>> if len(eqp_room) == 5: >>> eqp_drawing = row[1] >>> if NIroom == eqp_room: >>> newrow = >>> [int(treelevel)+1,elevation,locater,area,NIroom,eqp_drawing] >>> NI_data_list_new.append(newrow) >>> #Write appropriate piping info into the new list >>> for prow in unique_piping_list: >>> pipe_room = prow[0] >>> if len(pipe_room) == 5: >>> pipe_drawing = prow[1] >>> if pipe_room == NIroom: >>> piperow = >>> [int(treelevel)+1,elevation,locater,area,NIroom,pipe_drawing] >>> NI_data_list_new.append(piperow) >>> #Write appropriate equipment models and drawing into new list >>> if (locater != '' and NIroom == ''): >>> #Write appropriate equipment models and drawing into new list >>> for row in room_eqp_list: >>> eqp_locater = row[0] >>> if len(eqp_locater) == 4: >>> eqp_drawing = row[1] >>> if locater == eqp_locater: >>> newrow = >>> [int(treelevel)+1,elevation,eqp_locater,area,'',eqp_drawing] >>> NI_data_list_new.append(newrow) >>> #Write appropriate piping info into the new list >>> for prow in unique_piping_list: >>> pipe_locater = prow[0] >>> if len(pipe_locater) == 4: >>> pipe_drawing = prow[1] >>> if pipe_locater == locater: >>> piperow = >>> [int(treelevel)+1,elevation,pipe_locater,area,'',pipe_drawing] >>> NI_data_list_new.append(piperow) >>> #Rewrite NI_data to new list >>> if NIroom == '': >>> NI_data_list_new.append(NI_row) >>> >>> print (time.clock()-starttime) >>> >>> >>> relevant numpy.array code: >>> >>> NI_data_write_url = reports_dir + 'NI_data_room2.csv' >>> NI_data_list_file = open(NI_data_write_url, 'wb') >>> NI_data_list_writer = csv.writer(NI_data_list_file, delimiter=',', >>> quotechar='"') >>> starttime = time.clock() >>> #NI_data_list room_eqp_list >>> NI_data_list_new = numpy.array([['TreeDepth', 'Elevation', >>> 'BuildingLocater', 'Area', 'Room', 'Item']]) >>> for NI_row in NI_data_list: >>> treelevel = NI_row[0] >>> elevation = NI_row[1] >>> locater = NI_row[2] >>> area = NI_row[3] >>> NIroom = NI_row[4] >>> #Write appropriate equipment models and drawing into new array >>> if NIroom != '': >>> #Write appropriate equipment models and drawing into new >>> array >>> (rowtest, columntest) = numpy.where(room_eqp_list==NIroom) >>> for row_iter in rowtest: >>> eqp_room = room_eqp_list[row_iter,0] >>> if len(eqp_room) == 5: >>> eqp_drawing = room_eqp_list[row_iter,1] >>> if NIroom == eqp_room: >>> newrow = >>> >>> numpy.array([[int(treelevel)+1,elevation,locater,area,NIroom,eqp_drawing]]) >>> NI_data_list_new = >>> numpy.append(NI_data_list_new, newrow, 0) >>> >>> #Write appropriate piping info into the new array >>> (rowtest, columntest) = >>> numpy.where(unique_room_piping_list==NIroom) >>> for row_iter in rowtest: #unique_room_piping_list >>> pipe_room = unique_room_piping_list[row_iter,0] >>> if len(pipe_room) == 5: >>> pipe_drawing = unique_room_piping_list[row_iter,1] >>> if pipe_room == NIroom: >>> piperow = >>> >>> >>> numpy.array([[int(treelevel)+1,elevation,locater,area,NIroom,pipe_drawing]]) >>> NI_data_list_new = >>> numpy.append(NI_data_list_new, piperow, 0) >>> #Write appropriate equipment models and drawing into new array >>> if (locater != '' and NIroom == ''): >>> #Write appropriate equipment models and drawing into new >>> array >>> (rowtest, columntest) = numpy.where(room_eqp_list==locater) >>> for row_iter in rowtest: >>> eqp_locater = room_eqp_list[row_iter,0] >>> if len(eqp_locater) == 4: >>> eqp_drawing = room_eqp_list[row_iter,1] >>> if locater == eqp_locater: >>> newrow = >>> >>> numpy.array([[int(treelevel)+1,elevation,eqp_locater,area,'',eqp_drawing]]) >>> NI_data_list_new = >>> numpy.append(NI_data_list_new, newrow, 0) >>> #Write appropriate piping info into the new array >>> (rowtest, columntest) = >>> numpy.where(unique_room_eqp_list==locater) >>> for row_iter in rowtest: >>> pipe_locater = unique_room_piping_list[row_iter,0] >>> if len(pipe_locater) == 4: >>> pipe_drawing = unique_room_piping_list[row_iter,1] >>> if pipe_locater == locater: >>> piperow = >>> >>> >>> numpy.array([[int(treelevel)+1,elevation,pipe_locater,area,'',pipe_drawing]]) >>> NI_data_list_new = >>> numpy.append(NI_data_list_new, piperow, 0) >>> #Rewrite NI_data to new list >>> if NIroom == '': >>> NI_data_list_new = numpy.append(NI_data_list_new,[NI_row],0) >>> >>> print (time.clock()-starttime) >>> >>> >>> some relevant output >>> >>> >>> print NI_data_list_new >>> [['TreeDepth' 'Elevation' 'BuildingLocater' 'Area' 'Room' 'Item'] >>> ['0' '' '1000' '' '' ''] >>> ['1' '' '1000' '' '' 'docname Rev 0'] >>> ..., >>> ['5' '6' '1164' '4' '' 'eqp11 RB, R. surname, 24-NOV-08'] >>> ['4' '6' '1164' '4' '' 'anotherdoc Rev A'] >>> ['0' '' '' '' '' '']] >>> >>> >>> Is numpy.append so slow? or is the culprit numpy.where? >>> >>> Dewald Pieterse >>> >>> "A democracy is nothing more than mob rule, where fifty-one percent of >>> the people take away the rights of the other forty-nine." ~ Thomas >>> Jefferson >>> >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> [email protected] >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> >> -- >> Christopher Barker, Ph.D. >> Oceanographer >> >> Emergency Response Division >> NOAA/NOS/OR&R (206) 526-6959 voice >> 7600 Sand Point Way NE (206) 526-6329 fax >> Seattle, WA 98115 (206) 526-6317 main reception >> >> [email protected] >> >> _______________________________________________ >> NumPy-Discussion mailing list >> [email protected] >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > Dewald Pieterse > > "A democracy is nothing more than mob rule, where fifty-one percent of the > people take away the rights of the other forty-nine." ~ Thomas Jefferson > -- Dewald Pieterse "A democracy is nothing more than mob rule, where fifty-one percent of the people take away the rights of the other forty-nine." ~ Thomas Jefferson
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
