> using DDG or Scroogle - isn't Leaving the Cloud > Not missing this, because a lot of us are using it
for sure, and even Seeks Project - isn't Leaving the Cloud. the sad thing that - even Seeks Project PM says that distributed search (crawl/indexing) isn't possible here/now/etc. it's complicated etc. I know Many people that not mentioning Search because they are simply - surrendered the idea of "something" that could rule-out Google and others. - What i know about Search - even crawling everything is doable, there are tons of scientific materials on the theme of distributed crawling, search giants are doing the distributed crawling and where it could be said - that's not for client machines (however i believe it surely - is) - flavor of FBox(maybe not Plug as it could be weak on performance, but some other types of server, like these small multimedia PC centers) could perform the needed tasks. I find a link where some people crawl a 3rd part of Egypt internet in a week or two, That's Not So Bad! - it may be not as constant like Googles but there are tons of algorithms about how to cleverly renew the results, re-use that is already here etcetc. Modern research of p2p crawling, of distributed indexing are here for decade+ It clearly shows that scientific/academic world are constantly excited with this idea, given the existent initiatives like, let's say grub.org (non Java) that are evolving and reinventing again and again for that new Search goal - i think - infrastructure based on FBoxes could be here and people that could provide resources are already working. I won't say something unreal to this list because of affection with Idea, believe me, i'v done the needed research, used many flavors of meta-search, contributed to FBox wiki on this theme, , i'm part of the team that developing Search for our OT(concurrent editing) and XMPP social networks, based on Robots insiders/Pushing, to look - not so dumb comparing to todays crawling practice. What i can say for sure - constantly polling - crawling is a bad method and unsuited for next-gen -- Real-Time search, even for Corporations like Google with all the resource. And believe me - Google+ won't use the crawling for their network, there are articles - describing that their previous +, Wave, many of their initiatives were constructed with idea of Real-Time - based on Pushing the result. Now - is very interesting time when not-so-big start-ups like Superfeedr.com or Collecta.com could, eventually compete with Google etc. on real-time search, so do we. Thing that does suit - Push-search as contra to Pull-crawling. That techniques - if would be massively used - could introduce the whole New World where Search companies would compete on UI-UX and brave ideas, not on closed DB's. Will it be PubSubHubBub, Xep-0060 PubSub, or some other, popular technique, it is possible to compete there with Corporations and P2P crawling - started with, whatever - let's say month gap of collected data - comparing to structure that Corporations have built - that P2P or other technique-based - user based distributed Crawling(with old Polling) could eventually - finally - replace what Google or etc. doing in this area, and that could be proven by facts. Seeks Project as "the king" of all meta-search ideas among i'v seen - have a long-term goal to eventually become a crawling plug-in for existent servers, but, looks like, they - don't believe in this - themselves. Maybe FBox could inspire them, maybe FBox could - just use the Push for it's networks, whatever - FreedomBox Foundation - given the stated ideology - are responsible for Leaving the Cloud also. - > but more importantly, we will need to debate a topic that has been bounced > around in other similar projects; a search engine that will not only index > our > user published privacy/anonymous aware sites and material, as well as > tor/i2p/freenet sites. The reason why i bring this up is, there was a big > debate is > some of the larger anonymous/privacy concerned projects, > including TOR about having an internal search indexing service.. and the > general consensus > was that the ability for sites to not be found on a > search engine is a benefit, not a flaw. Not just the idea, but the request > that some sites, their users, >and content providers of these sites dont wont > to be found, don't want to be indexed, and dont want to be identified. In our Wave-alike networks that are inspired with Robots/Gadget scheme from Google Wave, the idea is - you - as an admin of Wave or as a trusted participant - could add the "Crawly" Robot to the discussion and it'll crawl, push-given-collect the resource - only if you wish so. What you'r talking could be the matter of robots.txt I need to say - that robots.txt and http://Schema.org - that modern Search Corporations want from you, is also - constantly changing and Is the theme for debates -- could be reinvented/improved. What i really need to remind for the readers - the Google itself are less than 13 years on real market. In my opinion - 13 years based on Good Promises and Not so Adequate Realization (from what you could read in - http://ilpubs.stanford.edu:8090/361/ - http://infolab.stanford.edu/~backrub/google.html) doesn't give any Astonishing Advantage over what FLOSS community and scientific world - being able now. _______________________________________________ Freedombox-discuss mailing list [email protected] http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/freedombox-discuss
