On Mon, 8 Nov 2010 12:43:59 -0500, Scott Deerwester <scott.deerwes...@gmail.com> wrote: > On Sun, 07 Nov 2010 18:04:36 -0500 Andrew Pitonyak <and...@pitonyak.org> > wrote: >> >> On Fri, 5 Nov 2010 21:24:08 -0400, Scott Deerwester >> <scott.deerwes...@gmail.com> wrote: >>> The following code (in Python): >>> >>> for r in range(dataRange.StartRow, dataRange.EndRow): >>> for c in range(dataRange.StartColumn, dataRange.EndColumn): >>> cell = sheet.getCellByPosition(c,r) >>> >>> >>> takes nearly two hours to run on a reasonably fast workstation for a >>> spreadsheet with 32 columns and ~27,000 rows (about .2 seconds per >>> row). By >>> comparison, opening the file takes about 8 seconds and saving the file >>> to a >>> CSV (which is functionally equivalent to the above) takes a few seconds. >>> The >>> original file was written in Excel as an XLS (not XLSX). This seems >>> impossibly slow. Am I misusing the API somehow? What can I do to speed >>> it >>> up? >> >> >> First, I usually determine which cells are used. Depending on the >> operations, this may be a gross test such as simply finding the largest >> used block. >> >> Next, if I must access the data in the cells, I obtain the data by >> getting >> the entire range and then calling getDataArray() (or getData, depending >> on >> my need / usage). >> >> I have a section on timing in AndrewMacor.odt where I search a Calc >> document using different methods. > > Thanks for the response, Andrew. I'm having trouble understanding how > to use cursors, cell ranges and cell range data. What I want to do is: > > - Create an aggregate that has a particular (rectangular) data range > that is a subset of the used cells. > - Call an efficient accessor (getDataArray sounds promising) to go > over the cells and write them out to a tab-separated file, with some > processing for date formats first. > > I don't think it exists, but what I'd like to do would be: > > cellRange = sheet.createRangeFromPosition(startRow, startCol, endRow, > endCol) > > for rowObj in cellRange.getDataArray(): > for colNum in range(len(rowObj)): > value = rowObj[colNum] > if needSpecialProcessing(startCol + colNum): > doSpecialProcessing(value) > else: > print value > > Barring that, I could do something like: > > cellRange = sheet.getDataArray() > > for rowNum in range(len(cellRange)): > if rowNum >= startRow and rowNum <= endRow: > rowObj = cellRange[rowNum] > for colNum in range(len(rowObj)): > if colNum >= startCol and colNum <= endCol: > value = rowObj[colNum] > if needSpecialProcessing(colNum): > doSpecialProcessing(value) > else: > print value > > but that ends up retrieving a lot more data than I need, and in case > isn't very elegant. Could you give me a simple example of how you'd > generate something that generates the data array, given a starting and > ending row and column?
I cannot easily find or create an example from my present location (I am traveling and working from memory), but, you can get a range from a sheet by position. http://api.openoffice.org/docs/common/ref/com/sun/star/table/XCellRange.html So, to obtain B1:c6 Left = 1 = column B Top = 0 = row 1 Right = 2 = C Bottom = 5 = row 6 DIm od DIm rown As Long Dim coln As Long od = ThisComponent.getSheets().getSheetByIndex(0).getCellRangeByPosition(1, 0, 2, 5).getDataArray() for rown = LBound(od) To UBound(od) oDataRow = od(rown) For coln = LBound(oDataRow) to UBound(oDataRow) oDataRow(coln) = rown + coln Next Next ThisComponent.getSheets().getSheetByIndex(0).getCellRangeByPosition(1, 0, 2, 5).setDataArray(od) Also, note Bernard's response with respect to the speed that he observed. Finally, you want to get the used range for the sheet, not the entire sheet I expect. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@api.openoffice.org For additional commands, e-mail: dev-h...@api.openoffice.org