Hello Chris, In the Portuguese Web Archive we did a study of web characteristics for the portuguese web. I don't know if this helps you but where is the papper.
João Miranda, Daniel Gomes, Trends in Web characteristics (best paper award: 2nd place), 7th Latin American Web Congress, Merida, Mexico, November 2009 Link to the papper: http://sobre.arquivo.pt/sobre-o-arquivo/trends-in-web-characteristics/at_download/file Presentation: http://sobre.arquivo.pt/about-the-archive/presentation-trends-in-web-characteristics About other publications from our archive: http://sobre.arquivo.pt/about-the-archive/publications?set_language=en Hope this is of assistence. Cheers, Simão Fontes On Sat, Jan 28, 2012 at 2:01 AM, Mattmann, Chris A (388J) <[email protected]> wrote: > (sorry for the cross post) > > Hey Guys, > > I'm trying to find a good citation or estimate (if anyone has done one) that > estimates > the breakout (by % or some other metric) of content types out there out the > web > (with a whole web crawl or a meaningful representative dataset) that are non > HTML. > > Anyone have any ideas about this? > > Thanks! > > Cheers, > Chris > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: [email protected] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >

