G'Day,

I would echo Paul's comment that you need attribute index support if you 
want your matching to go well.  I know where that should go in the API, 
but so far nobody has offered to implement this (for either shapefile or 
postgis).

For reference I would *like* to see attribute based index go here:

FeatureList sorted = fcResultat.sortBy( "attributeName" );

This will currently produce an in memory (sorry!) feature collection 
that is ordered on the provided attribute name. When you do decide to 
implement access based on an attribute index you will be able to run 
your code unmodified on very large datasets.

Note PostGis should be much better, although the use of sortBy outlined 
above *is not* currently implemented for PostGIS either, it would suck 
the entire feature collection into memory as well :-(

One other thing to check is the implementation of isEmpty() for your 
feature collection returned, it may be silly and check that the size() 
!= 0, in which case it would traverse the entire contents (and thus be 
slow). A simple optimization would be to traverse the contents until 1 
match is found, and then return true.

This optimization should not be much effort:
- (simple use a feature reader with MAX_FEATURES = 1
- and then check if the reader hasNext()

Note after the above optimization the isEmpty() method would still be be 
O(N) but at least it will not be C(N) :-)

Aside: looking at your code isEmpty does not buy you anything that 
iterator.hasNext() would not (since you only seem to be interested in a 
single match per loop?)

Recommendations (short term):
- remove the isEmpty check
- turn your code the other way round:
  1) run through your XL file once to build a hash map of attribute 
value to row number
  2) run through your feature collection, using the hash map to find a 
matching row if available

Recommendations (mid term):
- use sortBy to grab your entire shapefile into a FeatureList sorted by 
the selected attribute
- as you go through your XL file, do a binary search in your FeatureList 
to find the matching feature
  (you can do this with one line by making use of a Collections find 
with a custom Comparator)

Things that you could do to help everyone:
- optimize the isEmpty() check
- implement a FeatureList that is based on traversal of an attribute 
index, and hook that up to the sortBy method.

Cheers,
Jody
> Hi all!
>
> We have created a new Udig's wizard which allow us to geocode data 
> coming from xls files. This wizard allow us to :
>
> - choose excel file,
> - choose layer from the active map,
> - choose an attribute from the layer and a column from the xls file in 
> order to map data,
> - then launch our "mapper" to match data.
>
> Our problem coming from the map action in which we use a filter :
>
>              for(int iRow = bUtiliserLaPremiereLignePouTitre?1:0 ; 
> iRow < iNbLigneAImporter; iRow++){
>                 leRight.setLiteral(sheet.getCell(iColonneNumber, 
> iRow).getContents());
>
>                 debug("Start "+ iRow +" check feature collection here");
>                 fcResultat = fsLayerSelectionne.getFeatures(cf);
>                 if (!fcResultat.isEmpty()){
>                     debug("End "+ iRow +" check feature collection here");
>
>                     Feature f  = null;
>                     Iterator iFeature = fcResultat.iterator();
>                     f = (Feature) iFeature.next();
>                     
> clCoucheResultat.createCustomFeature(f.getDefaultGeometry().getCentroid(), 
> sheet.getCell(0,iRow).getContents(),sheet.getCell(iColonneNumber, 
> iRow).getContents());
>                 }
>             }
>
> We have to wait a "long" time during the "fcResultat.isEmpty()" when 
> using a shapefile. (it's a long time because we may have thousands of 
> lines)
>
> See the debug info :
>
> [DEBUG_GeoCodageExcelWizard_16:47:04 453] Start 338 check feature 
> collection here
> [DEBUG_GeoCodageExcelWizard_16:47:04 484] End 338 check feature 
> collection here
> => 31ms to apply the filter and query.
>
>
> When I put this layer in Memory (By creating a a new Layer in udig 
> with MemoryDataStore) performances are much better!!
>
> [DEBUG_GeoCodageExcelWizard_16:51:40 187] Start 338 check feature 
> collection here
> [DEBUG_GeoCodageExcelWizard_16:51:40 203] End 338 check feature 
> collection here
> => 16ms
>
> I would like to know if such a difference is normal, and how way could 
> we reduce calculation time by other way than creating a new 'in 
> memory" layer?
> I haven't trying this using other data source (postgis for example) do 
> you think performances will be better?
>
>
> Thanks in advance,
>
> Sebastien
>
> ------------------------------------------------------------------------
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share your
> opinions on IT & business topics through brief surveys -- and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> ------------------------------------------------------------------------
>
> _______________________________________________
> Geotools-gt2-users mailing list
> Geotools-gt2-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/geotools-gt2-users
>   


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Geotools-gt2-users mailing list
Geotools-gt2-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geotools-gt2-users

Reply via email to