Re: [api-dev] Extremely slow performance with getCellByPosition

Andrew Pitonyak Mon, 08 Nov 2010 12:14:41 -0800

On Mon, 8 Nov 2010 12:43:59 -0500, Scott Deerwester
<[email protected]> wrote:
> On Sun, 07 Nov 2010 18:04:36 -0500 Andrew Pitonyak <[email protected]>
> wrote:
>>
>> On Fri, 5 Nov 2010 21:24:08 -0400, Scott Deerwester
>> <[email protected]> wrote:
>>> The following code (in Python):
>>>
>>> for r in range(dataRange.StartRow, dataRange.EndRow):
>>>     for c in range(dataRange.StartColumn, dataRange.EndColumn):
>>>         cell = sheet.getCellByPosition(c,r)
>>>
>>>
>>> takes nearly two hours to run on a reasonably fast workstation for a
>>> spreadsheet with 32 columns and ~27,000 rows (about .2 seconds per
>>> row). By
>>> comparison, opening the file takes about 8 seconds and saving the file
>>> to a
>>> CSV (which is functionally equivalent to the above) takes a few
seconds.
>>> The
>>> original file was written in Excel as an XLS (not XLSX). This seems
>>> impossibly slow. Am I misusing the API somehow? What can I do to speed
>>> it
>>> up?
>>
>>
>> First, I usually determine which cells are used. Depending on the
>> operations, this may be a gross test such as simply finding the largest
>> used block.
>>
>> Next, if I must access the data in the cells, I obtain the data by
>> getting
>> the entire range and then calling getDataArray() (or getData, depending
>> on
>> my need / usage).
>>
>> I have a section on timing in AndrewMacor.odt where I search a Calc
>> document using different methods.
> 
> Thanks for the response, Andrew. I'm having trouble understanding how
> to use cursors, cell ranges and cell range data. What I want to do is:
> 
> - Create an aggregate that has a particular (rectangular) data range
> that is a subset of the used cells.
> - Call an efficient accessor (getDataArray sounds promising) to go
> over the cells and write them out to a tab-separated file, with some
> processing for date formats first.
> 
> I don't think it exists, but what I'd like to do would be:
> 
>   cellRange = sheet.createRangeFromPosition(startRow, startCol, endRow,
>   endCol)
> 
>   for rowObj in cellRange.getDataArray():
>     for colNum in range(len(rowObj)):
>       value = rowObj[colNum]
>       if needSpecialProcessing(startCol + colNum):
>         doSpecialProcessing(value)
>       else:
>         print value
> 
> Barring that, I could do something like:
> 
>   cellRange = sheet.getDataArray()
> 
>   for rowNum in range(len(cellRange)):
>     if rowNum >= startRow and rowNum <= endRow:
>       rowObj = cellRange[rowNum]
>       for colNum in range(len(rowObj)):
>         if colNum >= startCol and colNum <= endCol:
>           value = rowObj[colNum]
>         if needSpecialProcessing(colNum):
>           doSpecialProcessing(value)
>         else:
>           print value
> 
> but that ends up retrieving a lot more data than I need, and in case
> isn't very elegant. Could you give me a simple example of how you'd
> generate something that generates the data array, given a starting and
> ending row and column?


I cannot easily find or create an example from my present location (I am
traveling and working from memory), but, you can get a range from a sheet
by position.

http://api.openoffice.org/docs/common/ref/com/sun/star/table/XCellRange.html

So, to obtain B1:c6

Left = 1 = column B
Top = 0 = row 1
Right = 2 = C
Bottom = 5 = row 6

DIm od
DIm rown As Long
Dim coln As Long

od =
ThisComponent.getSheets().getSheetByIndex(0).getCellRangeByPosition(1, 0,
2, 5).getDataArray()

for rown = LBound(od) To UBound(od)
  oDataRow = od(rown)
  For coln = LBound(oDataRow) to UBound(oDataRow)
    oDataRow(coln) = rown + coln
  Next
Next
ThisComponent.getSheets().getSheetByIndex(0).getCellRangeByPosition(1, 0,
2, 5).setDataArray(od)

Also, note Bernard's response with respect to the speed that he observed.

Finally, you want to get the used range for the sheet, not the entire
sheet I expect.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [api-dev] Extremely slow performance with getCellByPosition

Reply via email to