On Mon, Aug 6, 2012 at 12:59 AM, pancake <panc...@youterm.com> wrote: > > Did you tried with parsifal? Anyway.. My parser was simpler than all that > xml-strict foo. So it worked too with corrupted and partially downloaded rss > files. > > http://hg.youterm.com/mksend/file/14984ebd1529/parsifal >
I'll investigate this, thanks! I agree the downside of the current XML parser I use is that it's a validating XML parser, meaning the XML should be correctly formatted. I will replace it with a non-validating parser at some point though. >> I like to use curl because it handles https, http redirection and also >> allows me to pass the date of the latest update so HTTP caching will >> work too. But curl can easily be replaced by wget or fetch though. > > I end up using wget and using local files with rss2html to process them. > Depending on a library for this is imho not suckless I agree, it doesn't depend on libcurl, just the command-line curl. You can easily replace this with wget or fetch like I said. >> >>> Actually, the only useful feature was the 'planet' option which >>> sorts/merges all your feeds in a single timeline. >> You can specify multiple feeds in a config file and run sfeed_update >> with this config file as a parameter. Then pipe it through sfeed_html >> . > > Config file for what? Specifying a list of feeds should not be in a config > file. Maybe in a wrapper script or so. I agree. Sfeed_update is an optional wrapper script, it's the script I use and added for convenience. You can write your own wrapper scripts around sfeed (I know some people here prefer rc over sh for example). > > Iirc A suckless way should be exporting a tsv where the first word of each > line is the unix timestamp, so using sort -n should be more unix friendly. > > At the end a feed reading should just comvert from various crappy atom/rss > formats to an unified tsv output. The rest can be done with grep, sort and > awk. Even the html output > I somewhat agree and this is what sfeed does. The optional sfeed_update wrapper script does a little more than that though. It makes sure there are no duplicates, groups them by feedname etc, a snippet from the sfeed_update script: # merge raw files. # merge(oldfile, newfile) merge() { # unique sort by id, link, title. # order by feedname (asc), feedurl (asc) and timestamp (desc). (cat "$1" "$2" 2> /dev/null) | sort -t ' ' -u -k7,7 -k4,4 -k3,3 | sort -t ' ' -k10,10 -k11,11 -k1r,1 } > I would suggest exporting json too. That will make templating work on client > side and no need to do any templating system. Static html is good for lynx... > Another option i would suggest is to put that template design in config.h You can convert the tsv format to json, it should be very trivial. > Can you specify filter for words? Grep will work here? Definitely, you can grep -v the tsv feeds file or just the stdout of sfeed. >> >>> I also wanted to have a way to keep synced my already read links. But that >>> was a boring task. >> >> Atm I just mark all items a day old or newer as new in sfeed_html and >> sfeed_plain. In your browser visited links will ofcourse be coloured >> differently. >> > > The workflow i would like to have with feeds is: > > Fetch list of new stuff > Mark them as: > - uninteresting (stroke, possibly add new filtering rules) > - read later (have a separate list of urls to read when i have time) > - mark as read/unread. > - favorite (flag as imprtant thing) > - show/hide all news from a single feed > > I understand that this workflow shouldnt be handled by sfeed, because thats a > frontend issue. But having html output does not allows me to do anything of > that. Sounds useful and all that should be possible but you need to write additional scripts for that. > With json it would be easy to write a frontend like that easily in javascript > ( blame me, but its fast and its everywhere). There's also a minimalist json > parser named js0n that can do that from commandline too. > > But probably people in this list would expect an awk friendly format instead > of json. (tsv can be easily converted to json) > I opted for an awk friendly format, but I personally think json is a good format for data exchange (much better than XML). I hope that will answer you questions, feel free to contact me on IRC for any questions too for a faster answer :)