Hi all, I've implemented a new method of serializing SimpleFeatures using Apache Avro (http://avro.apache.org/) that utilizes dynamic Avro schema generation and a binary serialization. This results in smaller data size but, more importantly, faster feature serialization and deserialization than the traditional DataUtilities.encodeFeature() and DataUtilities.createFeature() methods. Avro is well suited for MapReduce, Accumulo, or other Hadoop-based frameworks where data serialization and I/O speed is important.
Essentially, the SimpleFeatureType specification (e.g. field1:String,field2:Integer,field3:Date,field4:Point) has a 1:1 mapping with an avro schema. Another great feature is speed of transformation. If we are transforming SimpleFeatureType x to Type y where y is a subset of x, we "skip" over fields that are not used in the transformation using a custom deserialization method. This can drastically reduce the time to render fields in geoserver. It's the analog to doing a SELECT a,b,c versus a SELECT * from a serialized feature. We're interested in creating pluggable serialization framework that allows developers to choose between avro, text, and other formats of interest (e.g. json) so that we can share the AvroSimpleFeature with others and register serializers via SPI similar to DataSources. On another note, what is the standard pipe-delimited format known as? Pipe Delimited Text? If anyone has any ideas/comments/interest let me know! -Andrew Hulbert ------------------------------------------------------------------------------ "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available. Simple to use. Nothing to install. Get started now for free." http://p.sf.net/sfu/SauceLabs _______________________________________________ GeoTools-Devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/geotools-devel
