Re: [Geoserver-users] Feature chaining vs denormalized tables

Rini.Angreani Fri, 01 Oct 2010 02:38:57 -0700

Hi Andrea,

After looking at your other email, I realised you have to use feature chaining 
for multiple multi-valued properties in some circumstances, i.e. in different 
levels of nesting. 
I think this is what Ben meant.
i.e.


<feature>
  <container-att1>
     <subatt1.1>...
     <subatt1.2>...
     <repeated-subatt1>...</repated-subatt1>
     <repeated-subatt1>...</repated-subatt1>
     <repeated-subatt1>...</repated-subatt1>
  </container-att1>
  <container-att1>
     <subatt2.1>...
     <repeated-subatt2>...</repated-subatt2>
     <repeated-subatt2>...</repated-subatt2>
     <repeated-subatt2>...</repated-subatt2>
  </container-att1>
  <container-att3>
     <repeated-subatt3 att="one">...</repated-subatt3>
     <repeated-subatt3 att="two">...</repated-subatt3>
     <repeated-subatt3 att="three">...</repated-subatt3>
  </container-att1>

To map this, you'll have to do something like this, before mapping the 
attributes of the sub features:

<AttributeMapping>
     <targetAttribute>container-att1</targetAttribute>
     <idExpression>xyz</idExpression>
     <isMultiple>true</isMultiple>
</AttributeMapping>
 
This would fail at validation, if the attribute is not nillable, because it 
would think that the sourceExpression is missing (therefore is null), since 
doesn't know about the next inline mapping (for the sub-attributes). 
It doesn't know how to differentiate, when the attribute mapping is only a 
header, or final, without having to parse the whole mapping file every time.
Victor tried to fix this last time, but then we realised, it's too complicated 
and not fixable, and this is why we used feature chaining.
Even if it passes validation, I'm not sure that it's going to work. 

In my initial reply, I was thinking of something like this, at the time, which 
is achievable without feature chaining:

<feature>
  <container-att1>
     <subatt1.1>...
     <subatt1.2>...
     <repeated-subatt1>...</repated-subatt1>
     <repeated-subatt1>...</repated-subatt1>
     <repeated-subatt1>...</repated-subatt1> 
    <repeated-subatt2>...</repated-subatt2>
     <repeated-subatt2>...</repated-subatt2>
     <repeated-subatt2>...</repated-subatt2>
    <repeated-subatt3>...</repated-subatt3>
     <repeated-subatt3>...</repated-subatt3>
     <repeated-subatt3>...</repated-subatt3>
  </container-att1>

I think I must've thrown away my experiment, it was when I first started 
working on it 2 years ago. 

Cheers
Rini
________________________________________
From: andrea.a...@gmail.com [andrea.a...@gmail.com] On Behalf Of Andrea Aime 
[andrea.a...@geo-solutions.it]
Sent: Friday, 1 October 2010 5:03 PM
To: Angreani, Rini (CESRE, Kensington)
Cc: geoserver-users@lists.sourceforge.net
Subject: Re: [Geoserver-users] Feature chaining vs denormalized tables

On Fri, Oct 1, 2010 at 10:13 AM, Rini Angreani <rini.angre...@csiro.au> wrote:
>
> Hi Andrea,
>
> I also thought that the main reason to use feature chaining like you said,
> is to avoid managing large denormalised views. The table/view gets really
> big when you have deeply nested features (sub features that have also have
> features), especially when multiple multi-valued properties are involved (in
> sub features). Although, according to Ben, it wasn't possible to map
> multiple multi-valued properties using grouping alone (without feature
> chaining). Sorry, I don't fully remember the limitations of grouping in old
> app-schema. Anyway, the whole grouping mechanism has been removed now, but
> still remains in 1.6, so if multiple multi-valued properties are involved,
> feature chaining must be used for 2.0 and beyond.

So in the current one it's possible to have multiple multivalued properties
and still use a single table?


> I played around with the alternative solution you proposed a while ago, i.e.
> querying by fids, but when the dataset gets really large, we ran out of
> memory pretty quickly (having to store the fids somewhere).

Yep, pretty understandable.
What you can do is to use paging: retrieve the fids, store them on disk
temporarily using a data output stream, then for each sub query you have
to run get, say, 1000 of those fids and run a first query, then repeat for
the next 1000 and so on. This should avoid keeping in memory a large
list of fids and at the same time avoiding too large queries.

Do you happen to have your experiments somewhere? I'd like to have
a look at that code.

> The only current plan to improve performance is to possibly halve the time
> it takes, since the queries are run twice (1st one to get size or
> numberOfFeatures printed on the header, and then to actually encode the
> features). This makes sense for simple features, because they're streamed,
> so getting numberOfFeatures before encoding is straight forward. Anyway,
> even when this is fixed, it's still not fast enough.
> There is a plan for improving the performance properly (on the database join
> level or using hibernate), but not in the near future.

Pity :-)

Cheers
Andrea

-----------------------------------------------------
Ing. Andrea Aime
Senior Software Engineer

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054  Massarosa (LU)
Italy

phone: +39 0584962313
fax:     +39 0584962313

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://www.linkedin.com/in/andreaaime
http://twitter.com/geowolf

-----------------------------------------------------
------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
Geoserver-users mailing list
Geoserver-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-users

Re: [Geoserver-users] Feature chaining vs denormalized tables

Reply via email to