G'Day, I would echo Paul's comment that you need attribute index support if you want your matching to go well. I know where that should go in the API, but so far nobody has offered to implement this (for either shapefile or postgis).
For reference I would *like* to see attribute based index go here: FeatureList sorted = fcResultat.sortBy( "attributeName" ); This will currently produce an in memory (sorry!) feature collection that is ordered on the provided attribute name. When you do decide to implement access based on an attribute index you will be able to run your code unmodified on very large datasets. Note PostGis should be much better, although the use of sortBy outlined above *is not* currently implemented for PostGIS either, it would suck the entire feature collection into memory as well :-( One other thing to check is the implementation of isEmpty() for your feature collection returned, it may be silly and check that the size() != 0, in which case it would traverse the entire contents (and thus be slow). A simple optimization would be to traverse the contents until 1 match is found, and then return true. This optimization should not be much effort: - (simple use a feature reader with MAX_FEATURES = 1 - and then check if the reader hasNext() Note after the above optimization the isEmpty() method would still be be O(N) but at least it will not be C(N) :-) Aside: looking at your code isEmpty does not buy you anything that iterator.hasNext() would not (since you only seem to be interested in a single match per loop?) Recommendations (short term): - remove the isEmpty check - turn your code the other way round: 1) run through your XL file once to build a hash map of attribute value to row number 2) run through your feature collection, using the hash map to find a matching row if available Recommendations (mid term): - use sortBy to grab your entire shapefile into a FeatureList sorted by the selected attribute - as you go through your XL file, do a binary search in your FeatureList to find the matching feature (you can do this with one line by making use of a Collections find with a custom Comparator) Things that you could do to help everyone: - optimize the isEmpty() check - implement a FeatureList that is based on traversal of an attribute index, and hook that up to the sortBy method. Cheers, Jody > Hi all! > > We have created a new Udig's wizard which allow us to geocode data > coming from xls files. This wizard allow us to : > > - choose excel file, > - choose layer from the active map, > - choose an attribute from the layer and a column from the xls file in > order to map data, > - then launch our "mapper" to match data. > > Our problem coming from the map action in which we use a filter : > > for(int iRow = bUtiliserLaPremiereLignePouTitre?1:0 ; > iRow < iNbLigneAImporter; iRow++){ > leRight.setLiteral(sheet.getCell(iColonneNumber, > iRow).getContents()); > > debug("Start "+ iRow +" check feature collection here"); > fcResultat = fsLayerSelectionne.getFeatures(cf); > if (!fcResultat.isEmpty()){ > debug("End "+ iRow +" check feature collection here"); > > Feature f = null; > Iterator iFeature = fcResultat.iterator(); > f = (Feature) iFeature.next(); > > clCoucheResultat.createCustomFeature(f.getDefaultGeometry().getCentroid(), > sheet.getCell(0,iRow).getContents(),sheet.getCell(iColonneNumber, > iRow).getContents()); > } > } > > We have to wait a "long" time during the "fcResultat.isEmpty()" when > using a shapefile. (it's a long time because we may have thousands of > lines) > > See the debug info : > > [DEBUG_GeoCodageExcelWizard_16:47:04 453] Start 338 check feature > collection here > [DEBUG_GeoCodageExcelWizard_16:47:04 484] End 338 check feature > collection here > => 31ms to apply the filter and query. > > > When I put this layer in Memory (By creating a a new Layer in udig > with MemoryDataStore) performances are much better!! > > [DEBUG_GeoCodageExcelWizard_16:51:40 187] Start 338 check feature > collection here > [DEBUG_GeoCodageExcelWizard_16:51:40 203] End 338 check feature > collection here > => 16ms > > I would like to know if such a difference is normal, and how way could > we reduce calculation time by other way than creating a new 'in > memory" layer? > I haven't trying this using other data source (postgis for example) do > you think performances will be better? > > > Thanks in advance, > > Sebastien > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > ------------------------------------------------------------------------ > > _______________________________________________ > Geotools-gt2-users mailing list > Geotools-gt2-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/geotools-gt2-users > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Geotools-gt2-users mailing list Geotools-gt2-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/geotools-gt2-users