On Fri, 2007-10-19 at 22:15 +0100, Martin McEvoy wrote: > On Fri, 2007-10-19 at 10:57 -0400, Manu Sporny wrote: > > Martin McEvoy wrote: > > >>> I Really dont think that we can have a clear Idea of what hAudio is > > >>> Until our our examples are re-studied without the use of a program. > > > > > > Because it is my opinion that the data output of your application is not > > > to be relied upon > > > > I don't want this to become a nasty discussion, > > ?? now you are confusing me, this is a nasty discussion because I ask > questions? > > > Martin. I realize that > > you have questions about Microformalyze and I am attempting to answer them. > > > > I believe the tone of this discussion is a bit off... right now, it > > sounds like you're alluding to the notion that there has been some sort > > of "nefarious behavior" when gathering data for hAudio, > > I am not saying that there is some sort of sinister behavior going on at > all I am pointing out that the data that Microformalize outputs (in the > terminal) is not to be trusted. > > > or that the data > > we have is not dependable. I realize that my responses could have been > > less inflammatory and more explanatory. > > > > I am going to attempt to explain how Microformalyze works in a more > > explanatory manner. > > > > >> Why do you think this approach is going to help us? > > > > > > Why do you think that the Microformalyze approach is going to help us? > > > do you not think the Hand and Eye are a better approach? > > > > Microformalyze is a "Hand and Eye" approach... there is no automation to > > the "analyzing a web page" part of the tool. > > > > ... > > > It saves us the time from having to tally statistics by hand. It is also > > far more accurate to have a machine tally the results and statistics. > > > > ... > > > Before we were using Microformalyze there were several errors when > > calculating the statistics that I made. It is difficult to go through 48 > > examples and over 1,000 properties by hand, calculate statistics, and > > not expect some human error. > > > > Here's how we used to gather examples for hAudio: > > > > 1. Open up the hAudio Wiki. > > 2. Copy/Paste one example URL into a different tab in the web browser. > > 3. Copy/Paste the hAudio example template that had all of the properties > > into the correct part of the wiki page. > > 4. Flip between the hAudio Wiki tab and the example URL page, adding or > > deleting properties from hAudio. > > 5. Repeat this process 84 times (each page took around 20 minutes to > > analyze). > > > > Here's how it works with Microformalyze: > > > > 1. Open up Microformalyze > > 2. Click "Add URL" to add URLs that need to be analyzed. > > 3. Click "Add property" to add properties that you expect to see (this > > can also be done while you're analyzing the pages) > > 4. Once all of the URLs that need to be analyzed have been added, you > > click the "Next URL" button. > > 5. Microformalyze displays the URL in a web browser and you click > > checkboxes to specify what properties exist on the example URL page. > > So I tell the application what properties exist on a given page, and It > confirms if this is true or not? > > > This small change to the process cuts down the time to analyze a > > page greatly... mainly because you're not editing wiki text, you're > > just clicking a checkbox. > > 6. Repeat this process 84 times (each page took around 5 minutes to > > analyze). > > > > The old way of doing things took around 20 minutes per website. The > > Microformalyze way of doing things takes around 5 minutes per website. > > > > Now let's examine how we calculated statistics before: > > > > Here's how we did it via the Wiki: > > > > Every time a new property was created, I would have to go through and > > tally the results by hand. This was error prone and on more than one > > occasion, I had to wipe everything and start over. It also required me > > to triple-check my work to make sure I was reporting the correct > > statistics to the list. I spent hours doing this - just calculating > > statistics. There is a reason not many people help out with gathering > > examples and calculating statistics - it is tedious and excruciatingly > > time consuming. > > > > Here's how it was done using Microformalyze: > > > > You click a button and the statistics are automatically calculated for > > you. You click another button and it dumps the wiki formatted text for > > displaying the properties. It is no longer time consuming or error prone > > to do this! > > > > However, the most important aspect of Microformalyze is that ANYBODY can > > go back and validate our findings easily. The data files are there, > > there is a common namespace across all properties/websites, in other > > words: there is a verifiable paper trail. > > > > It is important to point out that this does not exist for any other > > Microformat that I know about. Verifiability of analysis results is very > > important! Reducing human error in statistics calculations is very > > important! Microformalyze builds this into the examples gathering and > > statistics calculation process. > > > > >> that helps the user track the properties on each page. It can > > >> automatically calculate statistics and helped the process of analysis > > >> immensely. > > > > > > This is my concern *HOW* does Microformalize do this? > > > > > > Microformalize has all the power of a high profile search engine that > > > can output the relevance of a given keyword in order and frequency of > > > occurrence correct? > > > > No, absolutely not. This is the core of your misunderstanding of what > > Microformalyze does. There is no "search engine" or "keyword matching" > > technology in Microformalyze. That would be a horrible way to go about > > gathering examples. > > > > All Microformalyze does is automate the tedious and error-prone parts of > > the examples and statistics gathering portion of the Microformats > > process. It also adds verifiability - which is really it's most > > important contribution to the process. > > Sorry my friend I don't think I was being very clear > > *HOW* does Microformalize do this? > > What Is a property? > how is a property determined?, > does Microformalize Analyze the raw html to determine the existence of > these properties? does it look for actual output on a web page? > > How does it gather statistics? > how are they compared?, are they compared against other url's loaded > into Microformalize, or does it calculate the occurence of a "property" > on a page, or some other way? > > > > > If you would like to see a detailed tutorial on how it works, the > > tutorial is available here: > > > > http://wiki.digitalbazaar.com/en/Microformalyze#Tutorial > > Thanks for the tutorial but How do I use Microformalize was not the > question. > > > > > I'd be happy to answer any other questions or concerns that you have > > about Microformalyze. Like I said before, all of the data files, source > > code (which I placed under the GPL), and documentation is available via > > the website listed above. You don't have to take my word for it... you > > could read the code, look at the data and see for yourself. > > I have had a look at the code but Python is not my strong point, Perhaps > you might like to explain? > > I did a test, the "properties" I was Looking for were Baba and Flumps > (because there is a good chance that these properties will NOT exist in > any of the pages I'm likely to test) > > here is the test file (copy and paste if you like) > > property Baba The Elephant > property Flumps A sweetie > url Bazaar http://blog.digitalbazaar.com/ > properties Baba Flumps > > sorry to use your url but it was the first thing that sprung to mind :) > > I ticked both boxes in the GUI Baba and Flumps and outputted the data in > the terminal > > Baba : 100.00% > Flumps : 100.00% > > > I looked at your page thinking "Huh" how can that be correct? > > In the web page text there is no mention of the words Baba or Flumps > > I looked at the source code No no Mention there either? > > Does Microformalize determine the existence of these properties in > another way? > > I added another url to examine > > property Baba The Elephant > property Flumps A sweetie > url Bazaar http://blog.digitalbazaar.com/ > properties Baba Flumps > url no foo in this http://weborganics.co.uk/ > properties Baba > > > the outputed data from the second url > > Baba : 100.00% > Flumps : 50.00% > > I KNOW these two properties do not exist in any way at WebOrganics > > Can you see now WHY I am concerned and moderately confused > Microformalize does not seem to be calculating the existence of these > properties on a page it seems to be Just calculating if I have ticked a > box or not. > > > Am I missing something?
You know I am Impossibly Dense sometimes, Lets call it being overworked :) Sorry Manu Looking at the above now It seems I was making an ass of myself again Wish I was better at apologising Thanks Martin > > > Thanks > > Martin > > > > > -- manu > > _______________________________________________ > > microformats-new mailing list > > microformats-new@microformats.org > > http://microformats.org/mailman/listinfo/microformats-new _______________________________________________ microformats-new mailing list microformats-new@microformats.org http://microformats.org/mailman/listinfo/microformats-new