On Saturday, 12 August 2017 at 19:53:22 UTC, Faux Amis wrote:
I would like to get into D again by making a small program which fetches a website every X-time and keeps track of all changes within specified dom elements.

My dom.d and http2.d combine to make this easy:

https://github.com/adamdruppe/arsd/blob/master/dom.d
https://github.com/adamdruppe/arsd/blob/master/http2.d

and support file for random encodings:

https://github.com/adamdruppe/arsd/blob/master/characterencodings.d


Or via dub:

http://code.dlang.org/packages/arsd-official

the dom and http subpackages are the ones you want.


Docs: http://dpldocs.info/arsd.dom


Sample program:

---
// compile: $ dmd thisfile.d ~/arsd/{dom,http2,characterencodings}

import std.stdio;
import arsd.dom;

void main() {
        auto document = Document.fromUrl("https://dlang.org/";);
        writeln(document.optionSelector("p").innerText);
}
---

Output:

D is a general-purpose programming language with
        static typing, systems-level access, and C-like syntax.
It combines efficiency, control and modeling power with safety
        and programmer productivity.




Note that the https support requires OpenSSL available on your system. Works best on Linux with it installed as a devel lib (so like openssl-devel or whatever, just like you would if using it from C).



How it works:


Document.fromUrl uses the http lib to fetch it, then automatically parse the contents as a dom document. It will correct for common errors in webpage markup, character sets, etc.

Document and Element both have various methods for navigating, modifying, and accessing the DOM tree. Here, I used `optionSelector`, which works like `querySelector` in Javascript (and the same syntax is used for CSS), returning the first matching element.

querySelector, however, returns null if there is nothing found. optionSelector returns a dummy object instead, so you don't have to explicitly test it for null and instead just access its methods.

`innerText` returns the text inside, stripped of markup. You might also want `innerHTML`, or `toString` to get the whole thing, markup and all.



there's a lot more you can do too but just these few functions I think will be enough for your task.


Bonus fact: http://dpldocs.info/experimental-docs/std.algorithm.comparison.levenshteinDistanceAndPath.1.html that function from the standard library makes doing a diff display of before and after pretty simple....

Reply via email to